Edited by: Arjen van Ooyen, VU University Amsterdam, Netherlands
Reviewed by: Andras Jakab, Medical University of Vienna, Austria; Emanuele Olivetti, Bruno Kessler Foundation, Italy
*Correspondence: Francois Rheault
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Recently proposed tractography and connectomics approaches often require a very large number of streamlines, in the order of millions. Generating, storing and interacting with these datasets is currently quite difficult, since they require a lot of space in memory and processing time. Compression is a common approach to reduce data size. Recently such an approach has been proposed consisting in removing collinear points in the streamlines. Removing points from streamlines results in files that cannot be robustly post-processed and interacted with existing tools, which are for the most part point-based. The aim of this work is to improve visualization, interaction and tractometry algorithms to robustly handle compressed tractography datasets. Our proposed improvements are threefold: (i) An efficient loading procedure to improve visualization (reduce memory usage up to 95% for a 0.2 mm step size); (ii) interaction techniques robust to compressed tractograms; (iii) tractometry techniques robust to compressed tractograms to eliminate biased in tract-based statistics. The present work demonstrates the need of correctly handling compressed streamlines to avoid biases in future tractometry and connectomics studies.
Diffusion magnetic resonance imaging (dMRI) tractography has greatly helped to advance the understanding of brain structural connectivity (Clayden,
This kind of memory usage makes these datasets barely usable on a majority of computers having only 8 or 16 GB of RAM. Another problem arises when trying to transfer these files to collaborators. The transfer can take a long time and use a lot of bandwidth. Finally, in many instances users want to manually segment streamlines in a 3D interactive visualization application. These manual virtual streamlines dissection (Catani et al.,
Another type of approach is to use a fast streamlines clustering algorithm (Olivetti et al.,
For iterative tractography reconstruction methods, the step size (distance between two consecutive 3D points) has a crucial impact on the quality of the reconstructed streamlines. Most algorithms use a uniform step size. With probabilistic tracking, which uses probabilities to determine the next stepping direction, if the step size is too small, the streamlines will look “noisy” because of the small differences in the orientation between each segments. Too big of a step size will create an overshooting problem in curved areas, resulting in an inability to reconstruct smaller U-shaped steamlines (Tournier et al.,
At a 1 mm isotropic resolution, a common white matter mask (obtained from the T1 segmentation or a thresholded FA map) contains nearly 500 k voxels. At the same resolution, a white matter and gray matter interface mask (Smith et al.,
In the field of connectomics, streamlines are often simplified to their 2 endpoints to represent the structural connectivity of the brain (Hagmann et al.,
Since current tools and methods used to interact with streamlines are point-based (Soares et al.,
The goal of this work is to integrate the linearization process presented in Presseau et al. (
A simplified representation of a streamline containing 75 points (dotted line) linearized and now containing 8
We successfully visualize 2M streamlines with a response rate between 15 and 25 frame per second. Then, we also show how to correctly filter and segment these compressed streamlines using regions of interest. Finally, we adapt tractometry methods to correctly handle linearized streamlines. In some cases, the proposed enhancements yield results that are even more robust than the classical point-based techniques.
Diffusion-weighted images (DWI) were acquired on a single volunteer along 64 uniformly distributed directions using a
For the experiments of Section 2.4, 27 bundles were extracted using the TractQuerier (Wassermann et al.,
The load-time linearization algorithm was implemented in MI-Brain (
For efficient loading and visualization, the linearization of Presseau et al. (
Perfectly collinear points are rare. This is why a maximum error threshold was used in the original method of Presseau et al. (
The first row represents how the maximum error threshold affects compression, while the second row represents how the maximum linearization distance affects compression. Case A occurs when all points are within the maximum error threshold and maximum linearization distance, meaning that the candidate point can be discarded. Case B arises when the current point breaks the maximum error threshold constraint, while respecting the maximum linearization distance. Case C is the opposite of Case B.
In this work, an additional parameter is introduced to reduce the compression time and facilitate the adaptation of streamlines selection using ROIs. The maximum linearization distance (MLD) constraint limits the processing needed at load-time by avoiding long backward verifications. Applying this constraint helps reduce the number of operations at low cost. For the maximum linearization distance, 3 values were tested : 5, 10, and 25 mm. The behavior of these two parameters is represented in Figure
There are 3 possible cases when iterating over the points of the streamline. Case IDs refer to Figure
Streamlines segmentation was done in MI-Brain on a 500 k streamlines tractogram, which was compressed using maximum error threshold values between 0.1 and 0.5 mm. Two cubic ROIs of 5 × 5 × 5 mm were used: one was placed in the region of the anterior posterior midbody of the corpus callosum (CC), while the other was placed in the cortico spinal tract (CST). The choice of these positions was done under the hypothesis that the compression ratio would vary according to curvature of the bundle. Since the CC is highly curved, it is less compressible, while the CST should be easily compressible, having a low curvature, as illustrated in Figure
A simplified representation of the CC
The term ROIs is used to define boxes and ellipsoids of any size and orientation or any complex surfaces in the graphical application. Our selection method works on any type of ROI since it is performed using polygon meshes that do not rely on a 3D grid. This is the basis of this type of method, which works in 3D space, trying to find at least one intersection with any type of geometrical object. This differs from the method presented in Section 2.4, which works in a grid and tries to identify all voxels touched by a streamline.
Using an approach based on points along with compressed streamlines will severely affect the results as a high number of points are missing from the dataset. The solution to this problem is to rely on the line segment between the points rather than the points themselves. By computing the intersection between the line segment and the ROIs (represented by polygon meshes in graphical application) the dependence on a small and uniform sampling along the streamlines is removed. However, this segment-based approach would necessitate a large amount of computing time as millions of streamlines contains tens of millions of line segments. To keep a responsive interaction, we use a progressive approach that reduces useless computation and still generates valid results. This approach is based on a coarse point-based filtering (as is done in most visualization tools), followed by a segment-based filtering to complete the set of selected streamlines.
This method, as implemented in MI-Brain, starts by finding all the points contained in the bounding box of the ROI using an octree, which is a fast spatial partitioning data structure (Samet and Webber,
The second phase of the progressive approach attempts to find streamlines that were missed by the point-based technique without going through all the segments in the dataset. Using the known maximum linearization distance described in Section 2.2, the ROI bounding box is extended by this maximum linearization distance into an extended neighborhood region. This extended neighborhood is guaranteed to contain at least one point of any segment intersecting the polygonal mesh, since segments cannot be longer than the maximum linearization distance. Points contained in the extended neighborhood are then extracted from the previously computed octree. Points belonging to already selected streamlines are not tested. The neighboring segments supported by the remaining points are then tested for intersection with the polygon mesh, using VTK functions.
To obtain all intersections correctly, the extended neighborhood size must be based on the maximal segment length in the dataset. The maximum linearization distance parameter introduced in Section 2.2 reduces the maximal size of the neighborhood. To accelerate the selection process, the size of the extended neighborhood can be based on the mean segment length of the dataset, but the results will not be complete. As demonstrated in Figure
Representation of a cubic ROI and its extended neighborhood with three streamlines passing through it. Case A would result in a success for both the point-based and the segment-based methods. For Case B, only the segment-based method would succeed, using the mean segment length heuristic or using the maximal segment length calculation. Finally, Case C would only be selected by a segment-based method using the maximal segment length calculation.
As for the streamlines selection problem, when working with streamlines with non uniform step sizes, statistics and maps computed from such streamlines can be biased. This was clearly sketched in Figure
This operation is radically different from the one proposed in Section 2.3 since it is a rasterization operation on a uniform 3D grid as opposed to a complex polygon mesh. This rasterization operation aims to find all voxels traversed by streamlines, even when no point of the streamline is in a voxel and only its segment is crossing it Figure
Of course, when this mapping from streamline to voxel is computed using only the points of the streamlines, techniques become susceptible to biases arising from varying step sizes. This will be referred as the voxel mapping bias. Varying step size can have many causes such as linearization (as presented earlier) or global tractography algorithms (Kreher et al.,
To overcome the voxel mapping bias caused by variable step sizes, we replace the normal point-based technique by a Bresenham-style line integration technique (Bresenham,
Using the dataset as described in Section 2.1 and both the basic, point-based technique and our proposed Bresenham-style technique, we calculated the mean values of diffusion metrics (AD, FA, MD, RD) for each bundle and error threshold combination. The basic point-based technique used the implementation available in Dipy (Garyfallidis et al.,
As seen in Figure
RAM space used by the application in relation to the maximum error threshold. Deterministic tractogram
For an analysis offering a fair comparison to deterministic tracking, a 0.2 mm linearization threshold was employed for probabilistic streamlines. Figure
When linearization is unconstrained, an increase in maximum error threshold results in a quick increase in the linearization time. In this case, the maximum linearization distance parameter can constrain the process, controlling the linearization time, especially for high maximum error thresholds as shown in Table
Comparaison of the compression time between an unconstrained linearization and a linearization constrained to 5 mm.
0.1 | 48,526 | 52,755 | 52,943 | 53,198 |
0.2 | 53,038 | 79,843 | 82,342 | 83,982 |
0.3 | 55,137 | 88,069 | 98,313 | 99,524 |
0.4 | 55,517 | 99,513 | 117,710 | 119,071 |
0.5 | 55,771 | 119,512 | 154,080 | 157,731 |
Figure
Comparison of the number of streamlines selected in the CC, under various MET and MLD values.
0 | 0 | 1,527 | 1,548 | 1,548 | 1.36 |
0.1 | 5 | 1,285 | 1,375 | 1,375 | 6.54 |
0.1 | 10 | 1,320 | 1,400 | 1,400 | 5.71 |
0.1 | 25 | 1,318 | 1,398 | 1,398 | 5.72 |
0.2 | 5 | 1,330 | 1,412 | 1,412 | 5.81 |
0.2 | 10 | 1,314 | 1,398 | 1,398 | 6.00 |
0.2 | 25 | 1,314 | 1,398 | 1,398 | 6.00 |
0.3 | 5 | 1,284 | 1,294 | 1,294 | 7.96 |
0.3 | 10 | 1,222 | 1,356 | 1,356 | 9.88 |
0.3 | 25 | 1,216 | 1,345 | 1,345 | 9.59 |
Comparison of the number of streamlines selected in the CST, under various MET and MLD values.
0 | 0 | 1,605 | 1,655 | 1,655 | 3.02 |
0.1 | 5 | 1,076 | 1,562 | 1,562 | 31.11 |
0.1 | 10 | 885 | 1,467 | 1,478 | 40.12 |
0.1 | 25 | 885 | 1,460 | 1,471 | 39.83 |
0.2 | 5 | 1,075 | 1,536 | 1,536 | 30.01 |
0.2 | 10 | 791 | 1,431 | 1,441 | 45.10 |
0.2 | 25 | 788 | 1,390 | 1,404 | 43.87 |
0.3 | 5 | 977 | 1,504 | 1,504 | 35.03 |
0.3 | 10 | 627 | 1,334 | 1,350 | 53.55 |
0.3 | 25 | 618 | 1,316 | 1,332 | 53.60 |
As both maximum error threshold (MET) and maximum linearization distance (MLD) increase, more streamlines are missed due to the increasing number of missing points. When no compression is done, using additional intersection tests only selects 3% more streamlines. However, even a maximum error threshold as low as 0.1 mm yields a loss of more than 30% of streamlines when the segment-based adaptation is not applied. The total number of selected streamlines varies as the parameters change. The small change in the streamline path caused by the compression explains that variation. Compression with a smaller maximum error threshold has less effect on the path but still modifies it slightly. Since it is fairly rare to linearize a long segment of a streamline, the heuristic is a good approximation of the complete calculation. Furthermore, the size of the box also affects these percentages, since a bigger ROI will increase the probability to encompass points, while a thin ROI would contain no point and only intersection tests would permit any selection.
The difference of selection results between techniques is more difficult to observe on a real tractogram, since they normally use a small maximum error threshold and are denser. Figure
Visual impact of compression on selected streamlines. The first row represents a small part of a corpus callosum (region of the anterior posterior midbody), while the second row represents a segmented cortico spinal tract.
Table
Mean FA values computed over the left CST, for various error thresholds.
Points | 0.55098 | 0.55006 | 0.54011 | 0.53195 | 0.52529 | 0.51808 | 0.51221 | 0.50902 | 0.49249 | 0.47406 |
Segments | 0.55113 | 0.55113 | 0.55113 | 0.55117 | 0.55126 | 0.55144 | 0.55182 | 0.55265 | 0.55366 | 0.55281 |
The reader might also have observed that the point-based and segment-based techniques do not provide the same values for the original, uncompressed bundle. This is due to the segment-based technique finding all voxels intersected by a streamline, even when no point of the streamline resides in the voxel. For example, a streamline might simply go over the corner of a voxel without actually having a point in that voxel, see Figure
Another trend is the steady drop in the mean FA value measured with the point-based technique when increasing the compression error threshold. The main cause of this phenomenon is that regions in straight parts of the bundle lose more points when compared to curved regions. In the CST for example, the mean FA value in straight sections of a bundle tend to be denser and only oriented in one direction, which increases the FA for those voxels. Meanwhile, curved sections tend to be near crossing with other bundles, which might reduce the FA in those regions.
Normally, the average value of a metric should take into account all voxels that are traversed by at least one streamline. This would generate a representative average value. This average can either be based on a binarized map (i.e., each traversed voxel is used once), or weighted by using the number of times each voxel is traversed. Depending on the use case, this weighting generates better averages, since regions that are more densely packed will have more influence on the average value (Yeatman et al.,
This can be seen in Figure
Difference in voxel weighting between the original and compressed left CST bundle, using the point-based technique. Whiter regions have a higher weight loss and also highlight the nonuniformity of the weight loss for the statistic of the bundles. The zoomed portion shows how curvature affect compression which directly influence the weight loss. The simplified view on the right helps to understand how a point-based method reduces streamlines density in some voxels when we compress streamlines. The overall influence of these voxels on the weighted average will decrease.
To verify that the segment-based technique works correctly for various shapes and sizes of bundles, we define
Table
Mean and standard deviation of
AD | pts | (3.5 ± 4.4) × 10−4 | (8.4 ± 6.5) × 10−3 | (1.6 ± 1.2) × 10−2 | (2.3 ± 1.5) × 10−3 |
seg | (4.4 ± 5.1) × 10−7 | (5.3 ± 5.4) × 10−5 | (8.9 ± 9.9) × 10−4 | (1.2 ± 1.2) × 10−3 | |
FA | pts | (0.0 ± 1.1) × 10−3 | (2.2 ± 1.5) × 10−2 | (6.1 ± 2.5) × 10−2 | (1.2 ± 0.31) × 10−1 |
seg | (1.1 ± 1.8) × 10−6 | (7.0 ± 5.8) × 10−6 | (1.4 ± 1.1) × 10−3 | (2.2 ± 1.5) × 10−2 | |
MD | pts | (1.4 ± 1.6) × 10−4 | (2.8 ± 2.8) × 10−3 | (1.3 ± 0.7) × 10−2 | (4.7 ± 2.4) × 10−2 |
seg | (5.4 ± 9.3) × 10−7 | (5.5 ± 8.8) × 10−5 | (0.9 ± 1.4) × 10−3 | (0.8 ± 1.4) × 10−3 | |
RD | pts | (5.3 ± 6.7) × 10−4 | (1.2 ± 1.0) × 10−2 | (4.1 ± 1.8) × 10−5 | (1.0 ± 0.4) × 10−1 |
seg | (0.9 ± 1.6) × 10−6 | (0.8 ± 1.4) × 10−4 | (1.2 ± 2.2) × 10−3 | (1.4 ± 2.1) × 10−2 | |
Vol | pts | (4.0 ± 1.0) × 10−3 | (8.5 ± 2.0) × 10−2 | (4.1 ± 0.7) × 10−1 | (7.1 ± 0.6) × 10−1 |
seg | (0.0 ± 0.0) × 100 | (3.6 ± 3.2) × 10−4 | (1.3 ± 0.9) × 10−3 | (5.0 ± 1.7) × 10−2 |
Compressing streamlines datasets can ease the transmission of such datasets over networks and can make them more easily loadable in a visualization application. One of our contribution is the implementation of the linearization phase directly in the loading code of a visualization application (MI-Brain). This contribution removes the need to compress datasets before loading them, while keeping the benefits in interaction performance having lighter datasets. When implemented in such a way, the linearization phase increases loading time, but reduces subsequent parts of the pipeline, such as initialization of the data structures and data transfers to the GPU, resulting in the reduction of the overall time before display. Another contribution is the introduction of the maximum linearization distance parameters constraint the time needed for linearization by limiting the number of backward verification. The linearization method is also available in Dipy (dipy.tracking.streamlinespeed).
The number of points removed from the tractogram does not impact the visualization and enables large datasets to be visualized on medium-range computers. Compression of streamlines generates new questions such as the minimal level of details needed to perform virtual manual dissection or the accuracy of the path recovered during tracking vs. the linearize version. These questions are very similar to issues in image or video compression used for scientific purposes and depending on the use or the research, the answers vary greatly. Different needs will lead to different parameters for tractogram compression, or the absence of compression. More detailed descriptions of the optimal parameters for linearization and their consequences are in the Appendix (
The removal of points on streamlines impacts the subsequent segmentation when the underlying technique is point-based. The proposed segment-based approach solves this issue without affecting the interaction speed. Using the segment-based method can even improve the selection of uncompressed streamlines, since the step size is never infinitely small and some streamlines can still be missed using the point-based technique. The use of this approach is necessary to ensure that streamlines compression will not impact manual segmentation or any further processing based on streamlines.
When performing a virtual manual dissection with diffusion tractography, researchers tend to use tractogram with millions of streamlines to ensure a proper recovery of the desired bundle shape and spatial coverage. Compression is a good approach to enable researchers to use tractogram with more streamlines, developing methods robust to non uniform step size or compressed tractogram is crucial to avoid biased results. One of our contribution is the implementation of the robust interaction method in the visualization application MI-Brain.
It is important to mention that linearization does not have a uniform impact across the whole tractogram, because the curvature of each streamline will affect the amount of points removed. For example, research that perform analysis on multiple bundles could create a bias in their segmentation that vary between each bundle. More informations about the effect of linearization on the interaction with tractograms is available in the Appendix (
Adapting tractometry is also necessary to correctly handle compressed streamlines datasets. Even in the case of uncompressed streamlines, it was shown that segment-based tractometry was more reliable, since it could identify small voxels hosting a short part of a streamiine, even if no point of that streamline was present in said voxel. It may be argued that including such “unimportant voxels” could introduce some other kind of bias in the tractometry results. From preliminary tests, such a bias introduces almost no variability in the final metrics.
The robustness of tractometry metrics is important to ensure that derived statistics are not biased. Biased statistics could have adverse impacts on further analyses, which could for example generate false observations when studying neurogenerative diseases. Furthermore, advanced tractometry techniques, such as Yeatman et al. (
Since compression modifies the streamlines, and most tools are not adapted to this, a dedicated format, or at least a dedicated tag in an existing format is needed to store compressed streamlines datasets. The existing format from TrackVis (trk file) could support such tag. This would prevent potentially invisible errors when using unadapted tractometry or bundle dissection tools with compressed streamlines. By having a dedicated format, tools developers would need to add support, and would therefore be made aware of the pitfalls of compressed streamlines. Using compressed data with unadapted pipelines can cause biased results because such a high proportion of points were discarded from the tractogram. Such results could lead to erroneous conclusions when compared to conclusions arising from a pipeline taking the compression into account.
Finally, a new specific file format could prevent some confusion about the size of a dataset. Currently, most people know the approximate size of a tractogram if they know the number of streamlines and the step size. With a streamlines-specific compression algorithm, this could completely change how people judge a file size. Depending on the compression parameters, a file can use between 8 and 15 times less disk space than usual. For instance, when tractograms are shared between labs or for online databases, a format for compressed datasets would help. Not having such a dedicated file format could otherwise lead one to think the file is broken or sparse.
The way data is structured in a tractogram makes it hard to reduce its size in an efficient way. In a tractogram file (*.trk, *.tck, *.vtk) each streamlines is independent and stored separately in a random order. Therefore, a lot of methods commonly used for meshes cannot be applied to streamlines. Other compression methods to compress even more the data on disk have been explored in Presseau et al. (
The advantages of tractography datasets compression for interactive exploration or data storage are compelling. However, current point-based methods are not able to robustly handle such files. Algorithmic segment-based adaptations are crucial before their use can become widespread. It is critically important to adapt streamlines segmentation and tractometry techniques to enable them to correctly handle compressed streamlines and non uniform sampling of points along streamlines. Without this adaptation, all techniques that rely on a point-based analysis of the streamlines will yield biased, if not simply incorrect results. Since the size of streamlines datasets is expected to continue increasing in the coming years, we expect the use of compressed streamlines to gain more traction. We also expect that more tractography techniques will be able to generate streamlines with variable step sizes (Lemkaddem et al.,
Using all techniques presented in this paper can improve the experience of the user with faster and easier interaction. Furthermore, the disk space saved using compressed streamlines is important. For example, 100 tractograms, each having 500 k streamlines, can now fit on an 8 GB USB stick instead of needing more than 80 GB of space. With these adaptations, millions of streamlines can be loaded and interacted with on a medium-range laptop, making 3D streamlines visualization, segmentation and tractometry available to a wider audience. The implementation in the graphical application MI-Brain or in the free open-source library Dipy is a contribution aiming to put forward theses improvements.
Study conception and design: MD, FR, and JH; acquisition of data and critical revision: MD and JH; analysis and interpretation of data and drafting of manuscript: FR and JH.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We would like to thank Jasmeen Sidhu, Samuel St-Jean, Michael Paquette, and Gabriel Girard from the Sherbrooke Connectivity Imaging Lab (SCIL) for their help during the early writing phase. The authors would like to thank NSERC Collaborative Research and Training Experience Program in Medical Image Analysis (CREATE-MIA) for funding this research. We also thank the Université de Sherbrooke institutional chair in neuroinformatics for their support.
The Supplementary Material for this article can be found online at: