Visualizing Data Mining Results with the Brede Tools

A few neuroinformatics databases now exist that record results from neuroimaging studies in the form of brain coordinates in stereotaxic space. The Brede Toolbox was originally developed to extract, analyze and visualize data from one of them – the BrainMap database. Since then the Brede Toolbox has expanded and now includes its own database with coordinates along with ontologies for brain regions and functions: The Brede Database. With Brede Toolbox and Database combined, we setup automated workflows for extraction of data, mass meta-analytic data mining and visualizations. Most of the Web presence of the Brede Database is established by a single script executing a workflow involving these steps together with a final generation of Web pages with embedded visualizations and links to interactive three-dimensional models in the Virtual Reality Modeling Language. Apart from the Brede tools I briefly review alternate visualization tools and methods for Internet-based visualization and information visualization as well as portals for visualization tools.


INTRODUCTION
In a narrow sense, neuroimaging workfl ows involve neuroimaging image processing and analysis. In a more broader sense, the workfl ow in a neuroimaging study involves a number of other processes: gathering information, designing the experiment, brain scanning, interpretation of the study, relating it to other studies and communicating the study. Data mining in neuroimaging may not only be applied as the standard neuroimaging analysis but also set to work on other components in workfl ow, and visualization of the data mining results may help the individual researcher in understanding his or her data as well as in communication with other researchers.
A number of tools exists for visualizing neuroimaging data mining results when the result is a volumetric neuroimage. There are, however, also visualization tools for other aspects of the neuroimaging process, and one example is our Brede Toolbox (Nielsen and Hansen, 2000a). Starting out as a program for handling and visualization of data from the BrainMap database of Fox et al. (1994) the Brede Toolbox now includes its own database of results from neuroimaging -the Brede Database (Nielsen, 2003) -as well as analysis and visualization functions for a range of tasks. We have setup an automated workfl ow involving a few non-interactive batch scripts that construct practically the entire Web presence of the Brede Database with static Web pages and visualizations. Furthermore, automated workfl ows using the ontologies of the Brede Database can perform mass meta-analysis across brain functions or brain regions (Nielsen, 2005;Nielsen et al., 2006a).
we may ask if it is at all possible to make appropriate analyses and visualizations across studies in neuroimaging?
One simple visualization simply plots the positive results -the reported coordinates -in stereotaxic space. The program associated with the original BrainMap database displayed coordinates in 2D tri-planar plot (Fox et al., 1994). This type of visualization is maintained in a newer version of the database with the program Sleuth (Laird et al., 2005). WebCaret may display coordinates in 3D as colored spheres together with an infl ated cortical surface (Van Essen and Dierker, 2007), see Figure 2. The Brede Toolbox can generate 3D visualizations in the corner cube style of Rehm et al. (1998). Plotting points in 3D is not straightforward, -simple 'zero' dimensional graphics do not give an important perception of depth, therefore we use 3D glyphs of different color and shape. To help the viewer in spatial localizing the coordinates we can add components in a confi gurable workfl ow such as AC/PC axes, stalks for the glyphs, glyph shadows on the tri-planar walls, contour and cerebral cortex outlines from the atlas of Talairach and Tournoux (1988). Figure 3 shows two visualizations of this kind with Figure 3A displaying all coordinates in the Brede Database from papers authored by Edward T. Bullmore and Figure 3B displaying cingulate coordinates colored according results from a text mining of the associated abstracts (Nielsen et al., , 2006a. The batch script setup for the Brede Database will automatically generate a plot like Figure 3A for each author mentioned in the author ontology. Sometimes these simple plots reveal interesting features: The Bullmore coordinates appear somewhat limited to the middle of the inferior-superior axis perhaps refl ecting a restricted fi eld of view selected for some of the studies. The elaborate and automated workfl ow for generating a plot like Figure 3B involves: 1. Select a brain region and from the Brede Database brain region ontology get all naming variation of the brain region and its subareas. With these names extract coordinates from papers recorded FIGURE 1 | Meta-analytic forest plot as a Web service with studies on personality genetics. Components in the Scalable Vector Graphics image fi le are hyperlinked and the content may be controlled interactively through a HTML form. Recent work with image-based meta-analysis has shown the possibility of constructing sensible forest and funnel plots for functional neuroimaging data (Salimi-Khorshidi et al., 2009b).

Nielsen
Visualizing data mining results

FIGURE 3 | Two examples of coordinates in a 3D corner cube visualization. (A)
Coordinates from the fi ve studies in the Brede Database authored by Edward T. Bullmore. The 3D glyphs have type and color according to paper: Dark blue (Phillips et al., 1997), light blue (Phillips et al., 1998), light green boxes (Bullmore et al., 1996), orange spheres (Hunkin et al., 2002), red (Calvert et al., 1999). (B) Cingulate coordinates colored according to the clustering results after a text mining of abstracts in the Brede Database. Dark magenta glyphs are from the 'memory' cluster while the light yellow are from the 'pain' cluster. From Nielsen et al. (2006a). in the database, model their spatial distribution and include extra non-matched coordinates that lies within the region. 2. Get abstracts from the Brede Database that -for the brain region in question -have one or more coordinates and perform text mining, which results in clusters of themes, such as 'pain' and 'memory' and documents belonging to these clusters. 3. Perform statistical tests on the spatial distribution of the coordinates grouped according to the text mining clusters to determine if the text mining has discovered functions that are segregated in the region.
The procedure is done for all brain regions in the Brede Database brain region ontology and Figure 3B shows one of the regions that listed high after sorting brain regions according to statistical signifi cance in the spatial distribution test. Data mining directly with the coordinates has been termed coordinate-based meta-analysis (CBMA) and several methods exists (Wager et al., 2009), see also Laird et al. (2009), this issue. For the most part they involve a form of estimation of a conditional probability density p(v|c) in stereotaxic space v. The conditioning, c, may be, e.g., for a specifi c brain function or a specifi c anatomical label. Once the probability density is estimated it can be converted to a volume by sampling the probability density in voxels and visualized in the same way as standard neuroimages, or the density can be used to color-code the cortical surfaces in a 3D visualization, see Wager et al. (2009). Fox et al. (1997) introduced the method to model the probability density: a single confi ned area -the primary motor area for the mouth -were examined so only a model with mean and standard deviation was devised, i.e., a simple Gaussian model. As more complex brain functions are distributed in brain space, more fl exible models are needed. Our fi rst effort in modeling the probability density was by Gaussian mixture models (Nielsen and Hansen, 1999): where each p(v|k) estimates a 3D Gaussian probability density. Figure 4A shows the isosurfaces in a model of this type where the parameters have been fi tted to data from the BrainMap database.
Here, each ellipsoids corresponds to a single Gaussian p(v|k) and c corresponds to three different labels of 'behavioral domain' from the BrainMap database that are associated with each coordinate.
Although the Gaussian mixture model may generalize, the ellipsoids do not look neuroanatomical plausible and call for yet more fl exible models. Figure 4B is generated with kernel density estimation using a Gaussian kernel (Nielsen and Hansen, 2000b). Such models seems to generate probabilities that are somewhat more neuroanatomical plausible than the Gaussian mixture model. The isosurfaces in the probability densities in both subplots of Figure 4 has been set for display purpose. More statistically grounded values can be obtained with the methods by Turkeltaub et al. (2002); Nielsen (2005); Costafreda et al. (2009). The methods for probability density estimation of coordinates are not limited to activations but may be applied to any kind of coordinates in stereotaxic space from 'deactivations' , cortical stimulations, lesions or structural changes, e.g., obtained with voxel-based morphometry.
When a probability density estimate is constructed for a set of coordinates and it is converted to a voxel-volume, then the volumes across multiple sets of coordinates may be aggregated into a single data matrix X(sets × voxels). This data matrix may then be decomposed with multivariate analysis in a number of ways, e.g., with singular value decomposition for principal component analysis, ULV = X, where the left factorization matrix U(sets × components) contains loading over sets of coordinates for each principal component and the right factorization matrix V(vowel × components) contains loadings over voxels. Other types of decomposition for this matrix are independent component analysis (MS = X, with M the mixing matrix and S the source matrix), non-negative matrix factorization (WH = X) and K-means clustering (CA = X, with C a centroid matrix and A an assignment matrix). The right decomposition matrices, V, S, H and A all contain vectors that each represents a volume. As part of the workfl ow for presenting the information in the Brede Database on the Web the decompositions work on data matrices formed from sets of papers and sets of experiments, and corner cube visualizations are automatically constructed with Perception (red wireframe), cognition (green surface) and motion ('M'-textured surface). From Nielsen and Hansen (1999). (B) Kernel density modeling of auditory (red wireframe) and vision (green) studies. From Nielsen and Hansen (2000b). user interface in Java. Among these tools are JIV that renders multiple volume data by orthogonal slice views implemented as a Java applet (Cocosco and Evans, 2001). iiV implements a similar functionality (Lee et al., 2008), and MindSeer can also render in 3D remotely (Moore et al., 2007). NeuroTerrain implements 3D visualization and has demonstrated its use in connection with a Mouse atlas (Gustafson et al., 2007). The Talairach Applet renders a digital representation of the Talairach Atlas and combines it with neuroanatomical labeling of coordinates via the Talairach Daemon described by Lancaster et al. (2000). Also in connection with the BrainMap database the Java client-program Sleuth plots 3D points in orthogonal 2D slices based on user query to the BrainMap server (Laird et al., 2005).
The Internet Brain Volume Database (IBVD) records published values for brain region volumes across variables such as gender and diagnosis (Kennedy et al., 2003). Since the neuroimaging data analysis arrives at one single value -the brain volume in cubic centimeters -the visualization of the data is relatively simple compared to other neuroinformatics visualizations: From Web-based user queries IBVD generates on-the-fl y PNG image-fi les with the brain volumes from the different studies plotted as a function of age with color-coding and the variability indicated. Interactive visualization systems for neuroimages with server-side 3D rendering have been described by Poliakov et al. (2005) and a public system is available with the WebCaret Web service, see Figure 2.
With the Brede Toolbox we construct 3D visualizations browsable on the Web by using the Virtual Reality Modeling Language (VRML) (ISO/IEC, 1997; Nielsen and Hansen, 2000a), see the VRML examples in Figure 4. When defi ned in the middle of 1990s VRML held great promise to get wide-spread use for 3D interactive and hyperlinked visualizations, but since then it has had limited growth: VRML lacks good browser implementations and there has been erratic adoption of a scripting language. Nevertheless, it is one of the few means for Web distribution of 3D content in free standardized form. An alternative format is the Universal 3D File Format (U3D) that can be embedded in newer versions of the PDF format. Apart from the Brede Toolbox ImageSurfer described by Feng et al. (2007) implements VRML export.
For the Web presentation of the Brede Database, we generate 3D corner cube visualizations of the coordinates in the database with an offl ine Matlab batch script, -both as image fi les embedded on the Web page as well as VRML fi les, see Figure 6. Matlab is not well suited to work as a Web script, and for the interactive Web scripts associated with the Brede Database, there are presently no visualization implemented. The INC Interactive Talairach Atlas renders 2D orthogonal slices from the Talairach and the MNI single subject atlases. This Web service can merge a user-given coordinate with the visualization, and as such we use it for visualization of individual coordinates from the Brede Database and the Brede Wiki, see Figure 7 for an example.
Besides Java, VRML and standard image fi les such as PNG the Scalable Vector Graphics (SVG) format may prove useful for Internet-based visualizations, see Figure 1 for an example. These fi les may contain hyperlinks and JavaScript. However, Web browsers do not yet consistently implement the standard.

INFORMATION VISUALIZATION
Data mining results from neuroimaging analysis are not the only type of information for visualization. Information about the background, isosurfaces in the volumes contained in the right decomposition matrices. Figure 5 shows such a visualization for a component from non-negative matrix factorization, i.e., a row in the H matrix. Such visualizations may be useful for navigating among the studies in the database, and to a certain extent they reveal spatial distributions of the 'cognitive components' of the brain. Together with the visualization on the Web page are listed the experiments that have high association with the component, i.e., experiments associated with large elements in a column of the left matrix W. For the component in Figure 5 they are experiments described as, e.g., 'Visual object decision' , 'Buildings visual objects' , 'Color perception during free viewing' and 'Passively viewed scenes' .
Before putting too much trust in visualizations and analysis across studies one needs to remember that the study results may have arisen in quite different ways. In standard meta-analysis the only variations between studies that are usually modeled is the number of subjects and the standard deviation of the data in the individual studies. In neuroimaging meta-analysis and visualization these variables are not usually modeled, for exceptions see Fox et al. (1997). Besides there are several other variables that neither are considered: The varying thresholds applied, e.g., corrected and uncorrected P-values (Nielsen et al., 2006b), the difference in fi eld of view between studies, the reporting style of coordinates (e.g., 'extent threshold' , 'number of maxima per cluster') as well as the variation from the different preprocessing and analysis choices that have been made. Furthermore, the different CBMA models may produce different results on the same material. Salimi-Khorshidi et al. (2009a) compared different CBMA models and their application of a threshold makes a 'blob' appear and disappear depending on the type of CBMA.

INTERNET-BASED VISUALIZATION
Quite a few tools exist for interactive neuroimaging visualization across the Internet. Often these tools are based on a client-server model with the client implementing the visualization and graphical

Nielsen
Visualizing data mining results design, scanning, analysis procedure, and interpretation surrounds the data mining results of a typical neuroimaging study. In scientifi c articles, the body text mostly carries this 'context' information, though sometimes authors also use tables to describe, e.g., subject information. Authors rarely apply visualizations for this kind of information except in situations with explanation of the experimental design and scanning. The experimental design has a natural temporal evolution and as such the visualization often displays the design as a function of time. Users of the behavioral experiment software from Psychology Software Tools is familiar with the graphical programming environment of E-Prime which has this kind of visualization as an integral part of the development of the experiment. Other parts of the neuroimaging study may be visualized with what is usually referred to as information visualization.
In a demonstration visualization, we employed a torus topology for an entire neuroimaging study process constructing 3D icons for 'funding' , the experimental design, authors, experimental subjects, etc. (Nielsen and Hansen, 1997), see also Figure 8. The usefulness of such a visualization depends on how effective it conveys information compared to standard text, and if the visualization format requires specialized and limited distributed programs for rendering and interaction the impact may be small. Manual creation of these visualizations is infeasible, -the visualization should be constructed automatically from description of the study, e.g., the so-called 'provenance' (Fissell, 2007). In related visualizations, some workfl ow management systems display the processing fl ow graphically (Dinov et al., 2008).
When neuroimaging studies get reported in articles the relationships between the articles can be turned in to visualizations. Many types of visualizations exist and many relationships may be revealed: Between terms, concepts, citations to and from articles as well as between authors, cited authors and cited journals. The visualizations are of course not limited to articles only in neuroimaging, see, e.g., Card et al. (1999); Chen (1999). For an example in neuroscience Naud et al. (2007) use a spherical embedding algorithm to display a bipartite graph in 3D space with two spheres. One of their illustrations visualized the relationship between poster sessions in the Society for Neuroscience 2006 meeting together with words from the abstracts in the sessions. Another example of text mining result visualization is what we termed a 'cluster bush' , that describe the clusters in a hierarchical multivariate analysis : Clusters are indicated with dots and thick lines indicate a large similarity between two clusters. Given a set of abstracts the automated workfl ow for generating a plot like Figure 9 involves the conversion of the texts to a bag-of-words matrix, the exclusion of a large number of words (stop words), hierarchical non-negative matrix factorization and lastly the 'cluster bush' visualization all implemented with the functions of the Brede Toolbox.
Coordinate-based meta-analysis and text mining can be combined to form visualizations, see Figure 10 and Nielsen et al.

Nielsen
Visualizing data mining results (2004). The workfl ow for constructing the visualization in the fi gure involves the setup of a matrix describing the words in the abstract of papers and the construction of another matrix from kernel density estimation with the coordinates in each paper. After non-negative matrix factorization each individual factor may be rendered in 3D and associated with words from the abstract, e.g., the blue area in Figure 10A in the occipital lobe is associated with words such as 'visual' and 'eye' . Based on a corpus of articles published between 1997 and 2000 in the journal NeuroImage we could plot cited authors and cited journals in 2D. The data mining with visualization would for example reveal a dichotomy between PET and fMRI (Nielsen, 2002), see Figure 11. Here, the workfl ow involves specialized algorithms that extract citations and the use of matrix computations, particularly singular value decomposition, for multidimensional scaling-like projection of the data onto 2D. For the Brede Database, we automatically construct what we have termed 'bullseye plots' to display the network of coauthors for each recorded author. Figure 12 shows a larger bullseye plot on coauthors in the NeuroImage corpus. Authors near the center, such as Friston and Dolan, have high network degrees, which here corresponds to the number of authored articles (Nielsen, 2002).
The well-tested and widely used GraphViz package provides spatial graph layout for a given network (Gansner and North, 2000). At one point the PubGene Web service used GraphViz in a large-scale application for displaying relations between genes based on literature in PubMed (Jenssen et al., 2001). GraphViz layouts graphs for the Web presentation of the Brede Database. These graphs display the brain function and brain region ontologies, e.g., indicating that 'vision' has 'perception' as taxonomic parent or that the cingulate area is a parent for the posterior cingulate, see Figure 13. Our workfl ow with the Brede Toolbox involves extraction of the ontology from Brede Database XML fi les, construction of a fi le with the graph that GraphViz reads, invoking GraphViz for generation of an image fi le, and then fi nally construction of the Web page with   the image fi le embedded. GraphViz can construct HTML image maps so the nodes in the graph image are associated with clickable hyperlinks. On the fi nal Web page a reader may navigate the brain region and brain function ontologies by clicking on the nodes in the graph. The Brede Toolbox can also use GraphViz for layout of other types of data that can be described as a network, e.g., from structural equation modeling of regional neuroimaging data. A number of journal Web sites use plots called Citation map in the style of GraphViz for visualizing in-and out-going citations of each article, see, e.g., BMJ and The Journal of Neuroscience Web sites.
Another type of graph visualization within neuroimaging is the interactive graph visualization with a hyperbolic browser that features in tools from the Laboratory of Neuro Imaging (LONI): LOVE and iTools (Dinov et al., 2006(Dinov et al., , 2008. ISI Web of Knowledge provides a Java applet to render their citation information with a similar topology.

CONCLUSION AND FUTURE WORK
With the Brede Toolbox we are able to build a workfl ow with extraction of data from the Brede Database, automated data mining and visualizations. The automated procedures generate publicly accessible Web pages with interactive visualizations. An advantage of the automated procedure is that little human intervention is required to update the visualizations as new data is added to the database. The visualizations can display not only spatial neuroimages, but for example also results from text mining, and visualization can take place across the Internet with data originating on one server and displayed on another.
The Brede Database represents just a small fragment of the results from the published literature (Derrfuss and Mar, 2009). Databases such as NeuroNames, BrainMap and SumsDB are much larger. However, no universal database exist for coordinates from functional neuroimaging. To gain a higher degree of coverage future work may attempt to aggregate data from different databases for combined visualizations. Since typical meta-analytic data is anonymous and small (compared to a typical neuroimaging study), it is easier to share such data and we may see collaborative Internet-based analyses and visualizations. Our wiki for personality genetics (Figure 1) is such a collaborative system. Building a collaborative system for neuroimaging data requires