Event Abstract

SciCrunch: A cooperative and collaborative data and resource discovery platform for scientific communities

  • 1 University of California, San Diego, Center for Research in Biological Systems School of Medicine, United States
  • 2 University of California, San Diego, San Diego Supercomputer Center, United States
  • 3 Yale University, Center for Medical Informatics, United States
  • 4 Yale University, Department of Neurobiology, United States

Introduction SciCrunch was designed to help communities of researchers create their own portals to provide access to resources, databases and tools of relevance to their research areas. A data portal that searches across hundreds of databases can be created in minutes. Communities can choose from our existing SciCrunch data sources and also add their own. SciCrunch was designed to break down the traditional types of portal silos created by different communities, so that communities can take advantage of work done by others and share their expertise as well. When a community brings in a data source, it becomes available to other communities, thus ensuring that valuable resources are shared by other communities who might need them. At the same time, individual communities can customize the way that these resources are presented to their constituents, to ensure that their user base is served. To ensure proper credit and to help share expertise, all resources are tagged by the communities that create them and those that access them. Exploring Data SciCrunch is one of the largest aggregations of scientific data and tools available on the Web. One can think of SciCrunch as a “PubMed” for tools and data. Just as you can search across all the biomedical literature through PubMed, regardless of journal, SciCrunch lets you search across hundreds of databases and millions of data records from a single interface. Such databases are considered part of the “hidden web” because their content is not easily accessed by search engines. SciCrunch enhances search with semantic technologies to ensure we bring you all the results. SciCrunch provides three primary searchable collections: • SciCrunch Registry – is a curated catalog of thousands of research resources (data, tools, materials, services, organizations, core facilities), focusing on freely-accessible resources available to the scientific community. Each research resource is categorized by resource type and given a unique identifier. • SciCrunch Data Federation – provides deep query across the contents of databases created and maintained by independent individuals and organizations. Each database is aligned to the SciCrunch semantic framework, to allow users to browse the contents of these databases quickly and efficiently. Users are then taken to the source database for further exploration. SciCrunch deploys a unique data ingestion platform that makes it easy for database providers to make their resources available to SciCrunch. Using this technology, SciCrunch currently makes available over 200 independent databases, comprising ~400 million data records. • SciCrunch Literature – provides a searchable index across literature via PubMed and full text articles from the Open Access literature. SciCrunch Communities SciCrunch currently supports a diverse collection of communities (Figure 1), each with their own data needs: • CINERGI – focuses on constructing a community inventory and knowledge base on geoscience information resources to meet the challenge of finding resources across disciplines, assessing their fitness for use in specific research scenarios, and providing tools for integrating and re-using data from multiple domains. The project team envisions a comprehensive system linking geoscience resources, users, publications, usage information, and cyberinfrastructure components. This system would serve geoscientists across all domains to efficiently use existing and emerging resources for productive and transformative research. • Monarch Initiative (http://monarchinitiative.org; Figure 2) – provides tools that will use semantics and statistical models to support navigation through multi-scale spatial and temporal phenotypes across in vivo and in vitro model systems in the context of genetic and genomic data. These tools will provide basic, clinical, and translational science researchers, informaticists, and medical professionals with an integrated interface and set of discovery tools to reveal the genetic basis of disease, facilitate hypothesis generation, and identify novel candidate drug targets. The goal of the system is to promote true translational research, connecting clinicians with model systems and researchers who might shed light on related phenotypes, assays, or models. • Neuroscience Information Framework (NIF) – is a biological search engine that allows students, educators, and researchers to navigate the Big Data landscape by searching the contents of data resources relevant to neuroscience - providing a platform that can be used to pull together information about the nervous system. Underlying the NIF system is the Neurolex knowledge base. Neurolex seeks to define the major concepts of neuroscience, e.g., brain regions, cell types, in a way that is understandable to a machine. • NIDDK Information Network (dkNET) – serves the needs of basic and clinical investigators by providing seamless access to large pools of data relevant to the mission of The National Institute of Diabetes, Digestive and Kidney Disease (NIDDK). The portal contains information about research resources such as antibodies, vectors and mouse strains, data, protocols, and literature. • Research Identification Initiative (RII) – aims to promote research resource identification, discovery, and reuse. The RII portal offers a central location for obtaining and exploring Research Resource Identifiers (RRIDs) - persistent and unique identifiers for referencing a research resource. A critical goal of the RII is the widespread adoption of RRIDs to cite resources in the biomedical literature. RRIDs use established community identifiers where they exist, and are cross-referenced in our system where more than one identifier exists for a single resource.

Figure 1
Figure 2

Acknowledgements

This work was partly supported by the NIH Neuroscience Blueprint under contract HHSN27120080035C and the National Institute of Diabetes and Digestive and Kidney Diseases under grant U24DK097771 and the National Institute of Aging under grant 1R03AG043018

Keywords: big data, big data integration, semantic data, Open Data, Data Federation, ontologies, Portal System

Conference: Neuroinformatics 2014, Leiden, Netherlands, 25 Aug - 27 Aug, 2014.

Presentation Type: Demo, to be considered for oral presentation

Topic: Infrastructural and portal services

Citation: Grethe JS, Bandrowski A, Banks DE, Condit C, Gupta A, Larson SD, Li Y, Ozyurt IB, Stagg AM, Whetzel PL, Marenco L, Miller P, Wang R, Shepherd GM and Martone ME (2014). SciCrunch: A cooperative and collaborative data and resource discovery platform for scientific communities. Front. Neuroinform. Conference Abstract: Neuroinformatics 2014. doi: 10.3389/conf.fninf.2014.18.00069

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 28 Apr 2014; Published Online: 04 Jun 2014.

* Correspondence: Dr. Jeffrey S Grethe, University of California, San Diego, Center for Research in Biological Systems School of Medicine, La Jolla, CA, 92093-0446, United States, jgrethe@ncmir.ucsd.edu