Neuroscience Simulation Data Format (NSDF) : HDF-based format for large simulation datasets
Nencki Institute of Experimental Biology, Poland
National Centre for Biological Sciences, India
With growing importance of simulations in the field of Neuroscience, storage and management of data from in silico experiments has become a common challenge. As size and complexity of computational models keep increasing with hardware capabilities, the amount of generated data is becoming prohibitively large for text based formats like csv (Comma Separated Values). In absence of any efficient standard format, individuals and groups often create ad hoc formats and develop analysis tools around them. This hampers reuse of code and sharing of data. The same problem with model description is being addressed through several community standards (CellML, NeuroML, NineML, SBML). In a different approach, the neo library (Garcia et al., 2013) tries to address these problems for electrophysiology data. However, currently there is no functional solution for sharing simulation data. Such an open standard for simulations will facilitate sharing and development of analysis and visualization tools. Moreover, with the requirement of data sharing for publication in many journals (PLOS, Science, Nature), a common data format will also help in the review and verification process in computational neuroscience research.
Here, we propose a format for storage and sharing of data from simulation experiments in computational neuroscience. This format is based on HDF5, which is a flexible, portable, and efficient file format for very large datasets. HDF5 is used in a wide range of scientific domains with diverse requirements (Klein et al., 2007; de Buyl et al., 2014).
Neuroscience simulations are performed in wide spatial and temporal scales, with varying complexity. We have developed a Neuroscience simulation data format (NSDF) for data that span the range from point processes to detailed compartmental models and networks as can be seen in typical use cases in Table 1. For efficient storage and retrieval, we propose saving data in 2D arrays whenever possible, organized as in Figure 1. The proposed format also incorporates data structures for storing morphology, connectivity information and other HDF5 attributes for storing metadata. We have utilized this format in tools for data visualization and analysis. We also used it to share a collection of open data from simulations of thalamocortical loop (Głąbska et al., 2014).
Figure 1 : Structure of NSDF file: uniformly sampled data from each population of sources (neurons, compartments, etc) are stored in 2D datasets under the same group. The first dimension of the dataset is mapped to a dimension scale containing the source IDs. The second dimension is mapped to a dimension scale containing the sampling times. Spike-times from each population are stored as 1D datasets, one per source. The sources are mapped to the datasets by a mapping-dataset for each population. Same applies to nonuniformly sampled continuous data, but these are mapped to time dimension scales corresponding to the sampling times.
Table 1 : Example use cases
This work is supported by EC-FP7-PEOPLE sponsored NAMASEN Marie-Curie Initial Training Network (grant n. 264872)
Garcia S, Guarino D, Jaillet F, Jennings T, Pröpper R, Rautenberg PL, Rodgers CC, Sobolev A, Wachtler T, Yger P and Davison AP (2014) Neo: an object model for handling electrophysiology data in multiple formats. Front. Neuroinform. 8:10. doi: 10.3389/fninf.2014.00010
PLOS Editorial and Publishing Policies - Sharing of Data, Materials, and Software. Available online at: http://www.plosone.org/static/policies#sharing
Science - General information for authors. Available online at: http://www.sciencemag.org/site/feature/contribinfo/prep/gen_info.xhtml#dataavail
Nature - Data deposition policies. Available online at
Klein, Larry and Taaheri, Abe (2007) HDF-EOS5 Data Model, File Format and Library. SE-RFC-008v1.0 Recommended Standard. Available online at http://earthdata.nasa.gov/sites/default/files/esdswg/spg/rfc/ese-rfc-008/ESDS-RFC-008v1.0.pdf
de Buyl P, Colberg HP, and Höfling F (2014) H5MD: a structured, efficient, and portable file format for molecular data. Comput. Phys. Commun. doi: 10.1016/j.cpc.2014.01.018
H. Głąbska, H. Chaitanya Chintaluri, Daniel K. Wójcik “Collection of simulated data for validation of methods of analysis of extracellular potentials”. Neuroinformatics 2014.
Neuroinformatics 2014, Leiden, Netherlands, 25 Aug - 27 Aug, 2014.
Poster, not to be considered for oral presentation
(2014). Neuroscience Simulation Data Format (NSDF) : HDF-based format for large simulation datasets.
04 Apr 2014;
04 Jun 2014.
Mr. Hanuma Chaitanya Chintaluri, Nencki Institute of Experimental Biology, Warsaw, Poland, email@example.com