Visualization, Interaction and Tractometry: Dealing with Millions of Streamlines from Diffusion MRI Tractography

Recently proposed tractography and connectomics approaches often require a very large number of streamlines, in the order of millions. Generating, storing and interacting with these datasets is currently quite difficult, since they require a lot of space in memory and processing time. Compression is a common approach to reduce data size. Recently such an approach has been proposed consisting in removing collinear points in the streamlines. Removing points from streamlines results in files that cannot be robustly post-processed and interacted with existing tools, which are for the most part point-based. The aim of this work is to improve visualization, interaction and tractometry algorithms to robustly handle compressed tractography datasets. Our proposed improvements are threefold: (i) An efficient loading procedure to improve visualization (reduce memory usage up to 95% for a 0.2 mm step size); (ii) interaction techniques robust to compressed tractograms; (iii) tractometry techniques robust to compressed tractograms to eliminate biased in tract-based statistics. The present work demonstrates the need of correctly handling compressed streamlines to avoid biases in future tractometry and connectomics studies.


OPTIMAL PARAMETERS FOR EFFICIENT LINEARIZATION
Maximum error threshold Using a 0.1mm threshold for the linearization phase removes the majority of points of a deterministic tractogram. Larger thresholds could cause voxel shifts and slightly modify the actual path of a streamline. This unwanted effect can be observed, but the consequences are limited even at high error thresholds such as 0.5mm. Using a low threshold incurs a limited compression time while still removing most points. Overall, the load-time linearization of a tractogram reduces the waiting period before visualization and interaction. The initial drop in Figure 9 shows that compression, even, as a supplementary step, globally reduces the waiting time before visualization or interaction.
To have comparable results in terms of compression ratio and compression time, the optimal maximum error threshold for probabilistic streamlines is slightly higher. Using a 0.1mm threshold, only 65% of the points of a probabilistic tractogram were discarded. This is caused by the frequent local direction changes inherent to probabilistic tracking, causing the linearization process to fail to remove lots of consecutive points. The optimal threshold for this type of tractogram was found to be 0.2mm. A higher threshold would encounter the same issues as high thresholds with deterministic streamlines, such as a longer compression time, potential visual differences, and very small gains in term of disk or RAM space.
During our experimentation only a human brain was used and the value for our optimal maximum error threshold value is based on the average size and shape of human tractogram. In a situation where a tractogram is much smaller (infants or small animals for example) the choice of parameter should be made based on its dimensions and resolution. We suggest that the maximum error threshold should be between one tenth and one twentieth of the resolution of the volume in which the tracking was computed. Maximum linearization distance An optimal choice for the maximum linearization distance was found to be 5mm, both for deterministic and probabilistic datasets. Since most iterative tracking algorithms use a step size under 1mm, a 5mm maximum linearization distance is often enough to remove an important number of points. As the step size gets smaller, the necessity for the maximum linearization distance parameter becomes greater to prevent endless compression time. The linearization being done at load-time in a visualization software, it is important to limit its duration so that user experience is not degraded Using such a value (5mm) for the maximum linearization distance leads to a smaller probability of missing streamlines when using the mean segment length based heuristic for the neighborhood exploration as demonstrated in Table 2 and 3. Adding intersections verifications for segments within a small extended neighborhood (heuristic) is enough to select 95% of the streamlines correctly and 100% if the complete extended neighborhood is used. Since the complete ROI intersection test uses the maximum segment length in the whole tractogram, constraining the maximum segment length to 5mm with the maximum linearization distance parameter reduces the potential number of segments to test for intersection. Figure 10 demonstrates how both neighborhood definitions affect selection time and shows how the heuristic drastically reduces the computation time for streamlines selection, by limiting the number of intersection tests to perform. The first two bars of each group represent the selection time using the mean segment length search, and the two last represent the selection using the maximum segment length search. At low maximum error thresholds or low maximum linearization distances, the time is relatively short. As either constraint increases, selection time using the maximum segment length search increases drastically. The heuristic can be used as an approximation during interaction, while the complete calculation can be used only to visualize or obtain final results. At higher compression rates, the approximation using the mean segment length is the only way to achieve real-time selection with compressed streamlines while maintaining mostly accurate results. This time factor was important to take into consideration when choosing optimal compression parameters. In a situation where selection needs to be fully accurate and time is less of a concern, the maximum segment length should always be used instead of the heuristic.
As mentioned previously, the experiments presented in this paper were done using human datasets. Every parameters expressed in millimeters should be adapted if a different type of data is used. We suggest that the maximum linearization distance should be between one tenth and one twentieth of the total dimensions in mm of the volume in which the tracking was computed.

Inputs:
1. P, sequence of 3D points (one streamline) 2. Maximum error threshold (MET), float 3. Maximum linearization distance (MLD), float Result: compressStreamlines, sequence of 3D points (one streamline) Function Linearization(P, MET, MLD): i = 0, j = 1, n = nbr points in P ADD P i to compressStreamlines while i < n do if distance between P i and P i+j > MLD then ADD P i+j−1 to compressStreamlines i = i+j-1 j = 1 CONTINUE end if i+j == n then ADD P i+j to compressStreamlines i = i+j BREAK end for k between i and i+j do if distance between line(P i , P i+j ) and P k > MET then ADD P i+j−1 i = i+j-1 j = 1 BREAK else CONTINUE end j += 1 end end return compressStreamlines Algorithm 1: Simplified linearization algorithm