Data Ontology and an Information System Realization for Web-Based Management of Image Measurements

Image acquisition, processing, and quantification of objects (morphometry) require the integration of data inputs and outputs originating from heterogeneous sources. Management of the data exchange along this workflow in a systematic manner poses several challenges, notably the description of the heterogeneous meta-data and the interoperability between the software used. The use of integrated software solutions for morphometry and management of imaging data in combination with ontologies can reduce meta-data loss and greatly facilitate subsequent data analysis. This paper presents an integrated information system, called LabIS. The system has the objectives to automate (i) the process of storage, annotation, and querying of image measurements and (ii) to provide means for data sharing with third party applications consuming measurement data using open standard communication protocols. LabIS implements 3-tier architecture with a relational database back-end and an application logic middle tier realizing web-based user interface for reporting and annotation and a web-service communication layer. The image processing and morphometry functionality is backed by interoperability with ImageJ, a public domain image processing software, via integrated clients. Instrumental for the latter feat was the construction of a data ontology representing the common measurement data model. LabIS supports user profiling and can store arbitrary types of measurements, regions of interest, calibrations, and ImageJ settings. Interpretation of the stored measurements is facilitated by atlas mapping and ontology-based markup. The system can be used as an experimental workflow management tool allowing for description and reporting of the performed experiments. LabIS can be also used as a measurements repository that can be transparently accessed by computational environments, such as Matlab. Finally, the system can be used as a data sharing tool.


INTRODUCTION
Processing and extraction of information from images have become indispensable aspects of the experimental workflow in life science research, and in particular in cell biology and neuroscience . The life science experimentation can be described in terms of processing workflows. A workflow representation provides an abstracted view over the experiment being performed. It describes what procedures need to be enacted, but not necessarily all the details of how they will be executed. A typical microscopic workflow involving problem definition, experimental execution, and data acquisition is summarized in Figure 1. At its last stage, the images are transformed into measurements, which are finally interpreted in the light of the original research question. At each step of the data processing workflow occurs a substantial decrease of the volume of output data (i.e., the input information for the next step). In contrast, this decrease is translated into an increase of the algorithmic complexity of generated information (derived data). Given this fact, if one wishes to automate data analysis and management in an experiment, all facets of so-outlined process need to be systematically framed. If this is not achieved every step of the process will lead to incremental irreversible loss of potentially valuable contextual information. Indeed, such loss of information can be a typical problem in research laboratories. Therefore, the management of the information flow and the acquired images obtained during an experiment, in particular, can be considered as a major challenge in life science imaging.
Each step of the experimental workflow typically requires the use of different hardware and software tools. Ideally, obtained raw data should be directly available for use in other applications either by remote instrument read-out or by transfer of the data upon application request.
Differences in the posed research questions result in differences in the pursued methodological approaches and hence in differences in the information context along the workflow. This leads to irreducible heterogeneity of the produced contextual information which is difficult to standardize and describe in a detailed way by a generic information system. In such case, it is convenient to adhere only to minimal standards regarding data schemas structure in order to maintain interoperability between different analysis platforms. This situation has been already recognized for microarray experiments where Minimum Information About a Microarray Experiment (MIAME) was adopted to facilitate presenting and exchanging microarray data (Brazma et al., 2001).
As discussed, the meta-data may refer to the experimental design, to the source, preparation, treatment, or other relevant properties of the biological material being studied, to the parameters and values of instruments used, or to the analytical procedures performed on the measurements. Because there are no generally accepted standards for meta-data, the equipment vendors define their own meta-data formats which are often incompatible with those developed by competitors. Subsequently, there are no commonly accepted standards for exchange of the derived imaging data, such as spatial measurements, temporal sequences, or geometric objects. According to the estimates of  there are approximately 80 proprietary file formats for optical microscopy alone that must be supported by any imaging tool for life science microscopy which aims to provide a general purpose solution. This lack of standardization of the meta-data has been recognized as a hindrance for the microscopic field (Goldberg et al., 2005;Linkert et al., 2010). Consequently, some standards are currently under discussion (Linkert et al., 2010). However, proposed standards are not yet sufficiently accepted by the research community, therefore they reflect only the experimental paradigms of their supporters. Altogether, this hinders the integration of heterogeneous data sources which are present in the life science workflow.

RELATED WORKS
The Open Microscopy Environment (OME 1 ; Goldberg et al., 2005) was designed as a system for image storage, visualization, and analysis. The information system employs an extended meta-data file format for raw data storage, which is based on TIFF and XML standards. Its main information objects are centered around acquisition meta-data (Goldberg et al., 2005). The information system's back-end is a Java Enterprise server application. There are two server realizations: a legacy OME Server developed until 2006 and a newer OMERO server. The new server employs Java remote objects and structured annotations as main technologies. Each annotation can be mapped to custom XML name-spaces . The OMERO clients allow the user to manage, view, annotate, and measure multi-dimensional remote images.
The Cell-Centered Database (CCDB 2 ) project realizes a web accessible database for high-resolution multi-dimensional data from light and electron microscopy. The system has grid-based architecture. Imaging data in the CCDB can be accessed through searching or by browsing through the image gallery. The Animal Imaging Database (AIDB) is a sub-project of CCDB, designed to provide a public repository for animal imaging data sets from MRI and related techniques. The open-source branch of the project is called OpenCCDB. OpenCCDB provides secure input forms and allows profiled data access and sharing. The database is implemented in PostgreSQL. The information system comprises of several integrated software tools: (i) WebImage browser -a web-based tool for viewing and annotating images similar to Google Maps. (ii) Segmentation and analysis tool for electron tomographic data, called Jinx. (iii) An image workflow application for registering brain images to the stereotaxic coordinate systems called Jibber. The tool is implemented in Java and is available through Java Web Start technology. (iv) Cytoseg is an extensible tool developed to automate segmentation problems in electron microscopic images. The tool is based on the OpenCV (Open-Source Computer Vision 3 ) library and Matlab.
The Bio-Image Semantic Query User Environment (Bisque 4 ) is a web-based platform designed to provide organizational and quantitative analysis tools for 5D image data (Kvilekval et al., 2010). Bisque's extensibility stems from its flexible meta-data and open web-based architecture. The system architecture is scalable, reconfigurable and extensible with core functions implemented as web-services. Most Bisque services are implemented using the REST access pattern, notably all managed items, such as images, datasets, meta-data, analysis and services are given unique URLs allowing simplified access by third party applications. The system uses a flexible meta-data model based on tags (i.e., named field with an associated value). Bisque is not designed to store image measurements but only annotations to images.
The Extensible Neuroimaging Archive toolkit (XNAT 5 ) is an extensible platform for secure management and exploration of neuroimaging data (Marcus et al., 2007). XNAT supports workflows including data validation through an online quality control process. XNAT also includes an online image viewer that supports a number of common neuroimaging formats, including DICOM and Analyze. The viewer can be extended to support additional formats and to generate custom displays. The XNAT DICOM tools are written in Java and XNAT supports the configuration of pipelines for data transfer and quality control checks.

DEFINITIONS AND TERMS
Instrumental for the subsequent discourse are the concepts of "measurement" and "ontology." The term measurement refers to Frontiers in Neuroinformatics www.frontiersin.org succinct quantitative representations of image features over space and time. This implies the application of the act or process of measurement, i.e.,"morphometry" to the raw imaging data. Therefore, measurement is synonymous to morphometric feature. Measurements, therefore, are reduced representations of the raw data, which have higher information complexity in the light of the previous discourse. Due to the irreversible information loss introduced by the process of measurement one needs an instance of (or a references to) the applied algorithm in order to be able to replicate the measurements (given the original data). Therefore, measurements are only implicitly "present" in images. Consequently, measurements can not be regarded as image "features." This consideration introduces a distinction between "image-centric" and "process-centric" approaches in management of experimental information. An example of the former approach is the web application Open Microscopy Environment (OME; Goldberg et al., 2005). The term ontology refers to a controlled vocabulary (i.e., data dictionary) about a specific area of knowledge. Examples of such areas include species anatomy, classes of chemical compounds, types of biological processes, cellular phenotypes, and nosological entities. Ontologies consist of hierarchical classification of entities linked with statically defined relationships. The structure of an ontology can be represented by a graph, i.e., for example a conceptual map. Ontologies can be viewed as mediators of data sharing and exchange between heterogeneous software applications. Heterogeneity in this context has several aspects: on the level of the hardware (i.e., different byte representation), the operation system, the program language (i.e., different data type representation), data source (i.e., the database management system), and finally the data schema. By providing explicitly the relationship between the entries in the vocabulary and the constraints on the data, ontologies provide the means for algorithmic translation of the data structures and instantiation of the data objects in different program language environments. Increasingly, biomedical researchers are looking to develop ontologies to support crosslaboratory data sharing and integration (Lependu et al., 2008). Examples for these are the Gene Ontology (GO 6 ) for gene products (Ashburner et al., 2000) and the Unit Ontology (UO 7 ). Such ontologies can be found in ontology repositories around the world, for example at the Ontology Lookup Service (OLS 8 ; Côté et al., 2006Côté et al., , 2008, the Open Biological and Biomedical Ontologies web site 9 , or the US National Center for Biomedical Ontology Bio-Portal web site 10 . At the time of writing there were 81 such ontologies present at the OLS hub and more than 200 in the BioPortal site.

SYSTEM REALIZATION
Since imaging data are produced in a defined experimental context (Figure 1) in order to manage the increase in information complexity in a systematic manner all major steps of the experiment workflow need to be addressed by an information system, e.g., a laboratory management information system. Such system needs not be universal due to the intrinsic heterogeneity of the experimental context. In contrast, such system needs to address properly the stages where transformation of the information context occurs. This context needs to include structured information about the experimental conditions, the followed procedures and protocols, and finally the instrument settings, by means of which the raw images are produced.
Brain connectivity datasets (i.e., connectomes) are graph representations of spatial relationships between anatomic structures at different levels of detail. Connectivity maps can be considered as derived properties of the imaged sample, which strongly depend on the imaging methods and the anatomical techniques used to acquire the data. As such, the connectomes belong to the "interpretation" level of the workflow since they are a combination of prior knowledge, which can be captured by a use of certain vocabulary (i.e., ontology), and spatial relationships between the constructed information objects. Therefore, connectomes should be backed by information systems capturing the data at several nodes of the workflow. In such way, connectomes can be computed dynamically on demand based on the available data in the information system, given the constraints of the query and the prior knowledge.
In this paper, I present LabIS, an integrated information system for storage, annotation, and querying of multiple sets of image measurements. The aim of LabIS is to facilitate the information flow in the life science workflow. To this end its development pursues three specific objectives. The first objective of LabIS is to automate the process of querying and report generation of the stored data. The second objective of LabIS is to automate the process of storage, annotation, and retrieval of image measurements. The third objective is to provide means for data sharing by providing interoperability between different applications consuming measurement data. Instrumental for this objective is the construction of a data ontology representing the common measurement data model, which is shared by the client and server software platforms. Development snapshots were presented previously in Prodanov (2008Prodanov ( , 2009. A developmental instance of the system is available through the website Sourceforge.net at http://labinfsyst.sourceforge.net/. The system is distributed under Lesser Gnu General Public License (LGPL).

USER INTERFACE
This section addresses the first objective of LabIS, notably the automation of querying and report generation of the stored data. The user interface (UI) of LabIS is divided in 5 main modules: Project planning, Subject management, Manipulation management, Image Measurements and Morphometry, and System Administration. Screenshots of the user interface can be found in Appendix A. The user-interface modules are organized in a similar manner: users can generate reports, enter data, or annotate already present database records.
The Project planning module is used for management of the records of research projects. The users can perform tasks, such Frontiers in Neuroinformatics www.frontiersin.org as deployment of new projects and/or changing the state or the attributes of ongoing projects. Groups of results can be organized in Results collections. Results collections in turn can be interpreted in the sense of mapping to atlas image datasets. Project characteristics can be reported in a flexible manner. The Subject management module manages the records for experimental animals. The users can perform tasks, such as registration of new subjects/animals, editing of records, introduction of new species, etc. The subjects can be assigned to projects and to experimental groups. Dynamical reports can be generated for arbitrary periods.
The Manipulation management module manages the records for performed manipulations. The users can perform tasks, such as registration and editing of manipulations. Dynamical reports can be generated for arbitrary periods.
The Image Measurements and Morphometry module manages uploaded measurement records. Uploaded measurements can be associated to a project, an experimental subject, experimental group, sample, result collection, or paired to other measurements. The measurements can be searched for by the name of the measured image, by the internal ID or simply browsed. There are possibilities for flexible reporting of the performed measurements. A dynamically generated measurement report is presented in Figure 2.
The Administration module manages the user roles, the maintenance of the database, and the system configuration. The users can also define custom ontologies.
A typical workflow that can be supported by the system is depicted in Figure 4. The user can describe a new project and specify its attributes in the Project module. Then she can define (experimental) groups related to the project. On a later stage, the user can enter manipulations that were performed and experimental subjects or samples. Independently, the user can upload measurements and annotate them with relevant attributes. The measurements can be collated in results collections, which in turn are associated to projects.
Data can be annotated in several ways. In the first place, measurements can be annotated in free text through the ImageJ clients (primary annotation). On the second place, measurements can be associated to hierarchical groups, experimental subjects, projects, and ontologies (data curation). On the third place, uploaded measurements can be registered in associated atlas spaces.

USER ROLES
The access to the system is password protected. The information system supports hierarchical user roles. There are 3 major types of users: ordinary users, editors, and administrators. The ordinary users can enter or annotate data and produce basic reports. The editors can edit already entered data. The administrators can create or delete other users or change their roles. They can also delete already entered data records.

SYSTEM ARCHITECTURE
This section addresses the realization of the second objective of LabIS, in particular to provide structured information context for the derivative image measurements. LabIS is a blend of multiple high-level server-side and client-side software technologies, such as the Structured Query Language SQL for relational database interaction, the Java programing language (Oracle Inc., USA), the Extensible Markup Language XML, and the server-side hypertext preprocessor language PHP 11 among others. LabIS realizes 3-tier architecture including a user interface front-end, an application middle tier, and a relational database back-end (Figure 3). Such architecture provides flexibility regarding the exploitation and deployment options. The system was tested in both MS Windows and Linux development environments.

Server-side
The back-end data storage layer is implemented in a MySQL database server (v. 5.1 12 ). The version that was used supports programing of stored procedures, views, and transactions.
The middle tier realizes the system application logic and the information objects. The middle tier is implemented in the web server scripting language PHP (v. 5.x). The PHP scripting engine runs in the environment of an Apache web server 13 .
The front-end realizes interactions with users and software clients. It includes a web user interface and bindings for Ajax and SOAP (Section 2.5).

Client side
The client side is realized by 3 types of clients: (i) web browser accessing the web UI (Section 2.1); (ii) web clients accessing the web-service interface via Ajax and SOAP bindings (Section 2.5); (iii) ImageJ modules accessing the database server or the webservice interface (Section 3.2).

DATA MODEL
The information object model and its relationship to the system modules of LabIS are depicted in Figure 4. Information objects of LabIS can be divided in two categories: infrastructure objects and application objects. The objects are mapped to the table structure of the database, the elements of the UI, and to the communication infrastructure. Examples of the former are the Sessions and Users, which handle authentication and session tracking; or the Groups, which handle the classification of projects and experimental subjects. Examples of the latter are the Measurements, Images, ROIs, and Calibrations, which all are mapped to the communication infrastructure.

Measurement ontology
Realization of a data ontology was instrumental for the data exchange between heterogeneous software applications. In the context of LabIS, the data ontology is used to map the data from one application language to another, for example from PHP to Java or from XML to PHP, or from JavaScript to PHP. In XML, the ontology was implemented as the name-space IJMes representing an extension of the public XML Schema name-space. In this way, any application that can decode the scheme will instantiate the ontology and will recover the meaning of the transmitted data. Using this framework, the internal representation of the data was decoupled from their publicly accessible form. This extensible ontology-based data exchange model was designed in order to ensure continuous evolution in the changing scientific environment. A key concept for the realization of the measurement ontology is the representation of the image measurement by the Measurement and Measurement record objects (Figure 5). The object can be either constructed from data on the client side (former case) and sent to the server or it can be instantiated from a serialized form residing in the database and either sent to the UI (latter case, Figure 2) or the client side. The data schema of the measurement objects reflects closely the format of the image measurements produced by ImageJ and is the foundation for the interoperability between LabIS and ImageJ.
Each measurement parameter type present in the database is associated with an ontology entry. The hierarchy of entries is represented in Figure 5B.
Data ontologies can also be used for automated annotation of the experimental data. In the course of an experiment part of the meta-data can be automatically generated at different processing steps while other parts can be introduced as annotations by users (i.e., experts).

COMMUNICATION INFRASTRUCTURE
LabIS is a distributed Internet and intranet application. The system supports interactions with third party applications using 2 access interfaces (Figure 3): (i) direct communication with the database back-end and (ii) communication via the web-service interfaces. The direct communication with the database is intended for intranet environments, where the requirements for security are lower. In intranet environments the data input is performed in platform-independent manner via SQL queries.

The web-service interface
Web-services are a way of aggregating and integrating data sources and software by using standardized interface and service discovery mechanisms. In the context of LabIS, the web-service interface is operated by the Object Server (Figure 3). The interface is intended for use in Internet environments, where opening of the database port may impose a security risk. The web-service interface interacts via the data ontology and is, therefore, at a higher level of abstraction compared to the database structure. Due to the use of platform-independent formats, such as XML and JSON, LabIS supports cross-platform data exchange in a generic manner.
The Object Server provides two types of communication protocols to clients: (i) SOAP binding; and (ii) JSON-RPC (JavaScript Object Notation -Remote Procedure Call) binding. Both protocols operate over a Hypertext Transfer Protocol HTTP transport layer. Stored data are wrapped as ontology-defined information objects and transmitted as dynamically generated XML or JSON documents over the HTTP transport layer.

SOAP protocol binding
SOAP provides a way to communicate between applications running on different operating systems, with different technologies and programming languages. SOAP is a protocol for exchanging XML-encoded messages over HTTP/HTTPS (most cases) or SMTP. Version 1.2 became a W3C Recommendation in 2003. One of the key advantages of SOAP is that it has a publicly accessible service discovery mechanism through the use of the language WSDL. A human-readable view of the service broadcast mechanism of LabIS is displayed in Figure 6.
There are implementations for the most common programming languages, such as Apache Axis for Java, PHP-SOAP, PEAR-SOAP, and NuSOAP for PHP; the SOAP extensions for the NET Framework, etc. The SOAP binding for LabIS was implemented using NuSOAP developed by Dietrich Ayala and currently supported by the company NuSphere 14 . More details about the protocol and its realization are given in the Appendix.

JSON-RPC protocol binding
The JavaScript patterns for asynchronous communication to application web servers are collectively referred to as Ajax. Ajax includes the combined use of several technologies in the webbrowsers (notably JavaScript and Document Object Model). Using Ajax, web applications can retrieve data from the server asynchronously in the background without interfering with the display and behavior of the existing page. The page can interact with the JavaScript technology based on events such as the loading of a document, a mouse click, focus changes, or even a timer. JSON-RPC is a light-weight communication protocol operating currently only over HTTP. In contrast to SOAP, JSON-RPC does not define name-spaces and only implicitly refers to complex types. There are implementations of this protocol in Java, PHP, and other languages. More details about the protocol and its realization are given in the Appendix.

INTEROPERABILITY
Interoperability with third party software is realized both on the client side and the server side.

INTEGRATION IN OLS
Support of third party ontologies is provided using the publicly available OLS ontology registry web-service. This support is realized on 2 levels: (i) individual measurement types can be annotated with ontology keys, for example using the Unit Ontology; (ii) the complex measurement objects can be annotated using terms of any of the ontologies supported by OLS. The integration with OLS is transparent for the user and is realized using a cascade of JSON-RPC calls from the browser which in turn trigger SOAP calls to the OLS web-service interface. This pattern of interaction is an example of a mixed client-server interoperability since a SOAP client instance acts also on the server-side to access a remote web-service interface.

INTEGRATION IN IMAGEJ
ImageJ is a public domain image processing program. It has an open architecture providing extensibility via third party Java modules (called plugins) and scripting macros. It is developed by Wayne Raspband since 1997 and expanded via contributed software code by an international group of contributors (Collins, 2007).
LabIS can be accessed directly from ImageJ via GUI clients. In such way, the entire image processing functionality of ImageJ was made available to the end user. The user can perform arbitrary measurements using any type of built-in or customized ImageJ plugins. The GUI front-end clients were implemented as a set of plugins: the SQL Results plugin and web-service SOAP Results and JSON Results plugins. The SQL-plugin implements a MySQL client that interacts directly with the database server. It is intended for use in intranet environments. The WSplugin implements a SOAP client and interacts with the Object Server of LabIS (Figure 3). This functionality is an example of interoperability on the client side.
The data representing an individual measurement or an array of measurements (i.e., the ImageJ ResultsTable object) together with the relevant meta-data concerning the image under study, such as dimensions, calibration, path, regions of interest, etc. are assembled in a complex Measurement object. After the end of the measurement session this object together with a JPEG-encoded thumbnail view of the active image are uploaded either using the SQL client or using the web-service client. Known measurement unit types are associated automatically to terms in third party ontologies, i.e., the Unit Ontology. The basic measurement unit types in ImageJ are: areas, diameters, perimeters, angles, circularity, coordinates in 2D and 3D, intensities, and calibrated pixel values. If a new measurement type is encountered it is also automatically included in the database. Such new type can be later annotated using the web UI and the ontology terms lookup service.

INTEGRATION IN MATLAB
Integration in the Matlab® computational environment (The Mathworks Inc., Natick, MA, USA) was achieved on the client side. Since version R2007, Matlab® provides client functionality for web-services. The generation of client scripts is fully automated Frontiers in Neuroinformatics www.frontiersin.org using the WSDL service discovery mechanisms. The high-level functionality is implemented by the createClassFromWsdl function, which accepts a path or URL to a WSDL resource as an argument 15 . Low level functionality is accessed by the use the functions createSoapMessage, callSoapService, and parseSoapResponse.

SPATIAL ANNOTATION AND ATLAS INTEGRATION
As of recently, LabIS provides atlas mapping and registration functionality 16  Results collection to atlases and to map individual members of the collection to normalized atlas imaging space. At present only pseudo 3D datasets (i.e., series of 2D images) are supported due to the paucity of publicly available high-resolution 3D atlas datasets on the web. The atlasing integration module of LabIS is backed up by the spatial extensions of MySQL, which in turn realize a subset of the Open Geospatial Consortium (OpenGIS) specification functionality. Notably, those extensions compute spatial relationships between objects and facilitate formulation of spatial queries.
A user case is demonstrated in Figure 7 where some measurements are mapped to a rat coronal histological atlas. This is achieved by integration with public atlas datasets, for example those available at http://brainmaps.org/. LabIS realizes generic atlas mapping functionality by using multi-resolution tiled images, for example those in Zoomify 17 format. The image rendering functionality on the client side is based on the Brain Maps web application programing interface (Mikula et al., 2007) and enhanced by incorporation of JSON-RPC functionality. The client-server interaction is realized by invoking JSON-RPC calls on the server. The user can mark and label the imported atlas dataset. Enhanced functionality is drawing and labeling of ROIs which can be stored in the database. This provides also possibilities for spatial querying of the measurements datasets.

DISCUSSION
Key features of so-presented information system can be discussed in several aspects. On the application level, advantages of the system are the demonstrated seamless interoperability between Java, PHP, and browser technologies (notably JavaScript) and the use of open communication standards, notably SOAP and JSON-RPC. On the system level, advantages of the system are the extendable data model, the independence of a particular programming language and the scalability of the component technologies, resulting in overall scalability with regard to the performance. On the level of exploitation and deployment, advantages of the system are the use of open-source platforms, which are available as standard hosting options in the most web hosting services.
The design of the information system follows several basic principles. In the first place, LabIS incorporates commonly accepted basic software technologies. The information system uses opensource software technologies and open communication and data storage protocols. This choice was made in order to allow for a better interoperability in the context of the experimental workflow (Figure 1). On the second place, to enforce data organization in a structured manner, LabIS realizes a centralized data storage model. It should be noted that LabIS is not a raw image database. The raw images are let to reside in remote repositories, such as on a local client file system or third party file server, while only references to them are stored centrally. In contrast, the imaging meta-data and the produced measurements are stored centrally in the relational database. Such an approach provides a definite advantage for the integration of third party imaging data, such as large scale digital atlases. It also increases the portability of the system since its entire database can be easily copied from one host to another.
On the third place, LabIS is designed and developed in a modular manner. Individual modules share the same information object model but are autonomous from each other. In such way, different modules can be introduced incrementally in time depending on the actual needs of the end-users. Finally, since the type of imaging data and the measurements that are performed can change in the future, I designed an extendable data model for communication with third party applications. 17 www.zoomify.com

APPLICATION DRIVERS
Until recently, management and storage of data were not considered as critical issues in the academic environments. Scientific journal articles were regarded as final and sufficient high-level summaries of the experimental findings.
However, the volumes of raw data generated by the current imaging experiments require scalable data management and archiving solutions (Bjaalie, 2002). Prominent examples of this upscaling are the projects exploring brain connectivity at the microscopic scale through large scale microcircuitry reconstructions and modeling in species like rat -the Blue Brain (Markram, 2006); drosophila - (Cardona et al., 2010) and human -the Human connectome project (Hagmann et al., 2010). Although, the latter project depends on functional magnetic imaging (fMRI), and diffusion tensor imaging (DTI) data 18 . This evolution mirrors the development of genomics and proteomics data warehouses [for example GenBank (Benson et al., 2000), Structural Folds of Protein database (SCOP; Andreeva et al., 2004), and Protein Data Bank database (PDB; Weissig et al., 2000)].
Moreover, with the advent of high-throughput and highcontent imaging methods considerable amounts of experimental data need to be stored and analyzed (Price et al., 2002;Manning et al., 2008;Watson, 2009). In electron microscopy, recently developed serial section transmission electron microscopy (ssTEM) techniques for serial imaging and reconstruction enables ultramicroscopic reconstruction of complete invertebrate nervous systems (Anderson et al., 2009;Cardona et al., 2010). Currently, quantitative analysis of molecular events in living organisms is performed with the combined application of imaging and genetic engineering technologies. Assays can include, for example, ligand screening (Loo et al., 2007;Szafran et al., 2008;Sutherland et al., 2011), cell transfection (Chang et al., 2004), and RNA interference (Echeverri and Perrimon, 2006;Moffat et al., 2006). High-throughput imaging encompasses also non-optical data of entire animals. Examples can be given in high-throughput MR imaging (Schneider et al., 2003;Pieles et al., 2007), computed tomography (CT; Johnson et al., 2006), and in vivo bioluminescence (Stell et al., 2007). High-content screening (HCS) combines the efficiency of high-throughput techniques with the ability of cellular imaging to collect quantitative data from complex biological systems. The HCS field has evolved from a technology used exclusively by the pharmaceutical industry for secondary drug screening, to a technology used for primary drug screening and fundamental research in academia. Biologists can now prepare and automatically image thousands of samples per day thus enabling chemical screens and functional genomics (for example, using RNA interference technology; Perlman et al., 2004;Loo et al., 2007). Typically, in a single experiment, tens to hundreds of experimental conditions are screened (Perlman et al., 2004;Echeverri and Perrimon, 2006;Moffat et al., 2006;Loo et al., 2007). Output data can comprise, for example, cell counts, per-cell protein levels, and morphological parameters (such as, sizes, cell/organelle shape, or subcellular patterns of DNA or protein staining). Such experiments can be analyzed in high-content manner using automated or semi-automated workflows, for example using open-source tools like CellProfiler . Commercial software of this type is also present, with a main application in the pharmaceutical screening market, by companies including Cellomics, TTP LabTech, Evotec, Molecular Devices, and GE Healthcare (Garippa, 2004). Application domains of these packages are mainly mammalian cell types and cellular features of pharmaceutical interest, including protein translocation, micronucleus formation, neurite outgrowth, and cell count. It is plausible to consider that so-outlined application drivers can eventually lead to scaling behavior of the biomedical imaging datasets mirroring the Moore's law in computing hardware. An analogous trend has already been recognized in neural recordings (Stevenson and Kording, 2011).
In the past 10 years, there has been an increasing focus on developing novel image processing, data mining, and database tools (Peng, 2008;). So-described process of upscaling of research infrastructure also requires complementary development of software and communication protocols, collectively denoted as "scientific middleware" (Prodanov, 2008). It eventually allows researchers to transparently use and share distributed resources, such as computing times, raw data, network services, and instruments. Therefore, a major design concern in scientific middleware needs to be scalability of performance and modularity. Modularity in the sense that features non-envisioned in the primary design features could be added in a manner that would not affect the core functionality. Examples can be given by the projects μManager (Edelstein et al., 2010), ImageJ, and CellProfiler.

INTEROPERABILITY ISSUES AND APPLICATION DOMAINS
Major issues for the management of biomedical imaging data can be outlined as (i) the lack of interoperability between the major acquisition and analysis packages; (ii) the intrinsic complexity and heterogeneity of the meta-data. From the neuroinformatics perspective these issues are recognized as the "databasing challenge," notably the accumulation, storage, management, and sharing of data; and the "tools challenge," notably the development and sharing of tools for data analyses (Bjaalie, 2008). While the complexity issue was already outlined in the Introduction, the following section focuses on the interoperability. Image acquisition, processing and data analysis involves data exchange between heterogeneous hardware and software. This requires maintaining interoperability between multiple, and most often, heterogeneous software packages and hardware devices. Interoperability can be achieved by: (i) adhering to common (i.e., mutually accepted) data exchange formats and communication protocols; or by (ii) direct integration of hardware (resp. software) in acquisition or analysis platforms.
On the image acquisition level, the interoperability challenge is being addressed by the project μManager (Edelstein et al., 2010). The μManager system facilitates microscope control and image acquisition and aims at the development of open-source software for control of automated microscopes. Drivers for different microscopic components are supported in modular manner thus allowing independent setups to be built.
On the image storage level the interoperability is addressed by OME (respectively OMERO) which provides structured storage and data import from a great number of proprietary image file formats. According to Swedlow et al. (2003) the most important recommendations that would help bridge the heterogeneity are that the meta-data should be readable by third party software using a widely accepted package or library and the commercial software programs need to provide data export to an open metadata specification and the scientists should use image processing and analysis tools that preserve image meta-data. On the image processing level, the challenge is addressed by CellProfiler which allows for the composition of complex processing workflows using a number of third party tools.
On the measurement and application level, the interoperability is addressed by LabIS. Presented interoperability cases demonstrate the integration of LabIS in the experimental workflow on the level of the organization and interpretation of the derived data.
Functionality realized in LabIS can be used in several directions. In the first place, LabIS is a workflow management tool. As such it allows for organization and reporting of the performed experiments. On the second place, LabIS can be used as a processed data (i.e., measurements) repository. Using the web-service or the database interfaces, the data can be accessed by computational and statistical environments, such as Matlab and the R-language. On the third place, LabIS can be used as a data sharing tool. It allows publication of the raw experimental data with any desired degree of restriction of the details. Such data can be used by other research teams to support collaborations. In contrast to specific atlasing tools, such as the Rodent Brain Workbench 19 (Moene et al., 2007), LabIS is much less application-centric. Sharing of the raw data and measurements in neuroscience gains momentum. It is believed that the widespread sharing of data and tools for neuroscientific research will accelerate the development of neuroinformatics (Eckersley et al., 2003). The exchange of raw neuroscience data between groups presents the opportunity to differently re-analyze previously collected data and encourage new interpretations (Amari et al., 2002). With the increase of experimental complexity and the size restrictions imposed by the scientific publishers, frequently essential experimental details are omitted from the final peer reviewed publications. This would eventually lead to unnecessary reproduction of experiments and waste of time and resources. On the other hand, data sharing can reduce experimental and analytical error. The Extensible Markup Language (XML) is a general-purpose open standard for creating custom markup documents. It is extensible in the sense that it allows its users to define their own elements using "name-spaces." Its primary purpose is to facilitate sharing of information between information systems. The specification of XML is supported by the World Wide Web Consortium (W3C). The recommendation specifies both the lexical grammar and the requirements for parsing. An XML document may contain element or attribute names from more than one XML vocabulary. The XML name-spaces are controlled vocabularies of the language containing collections of structuring elements which are not part of the original specification but comply to the XML grammar. They are defined by a W3C recommendation called Namespaces in XML.
XML Schema, published as a W3C recommendation in May 2001, is one of several XML schema languages. XML Schema is a XML language for data structures (i.e., schemas), prescribed by the W3C as the successor of DTDs. XML Schema can be used to express a schema: a set of rules to which an XML document must conform in order to be considered "valid" according to that schema.

B.2. SOAP PROTOCOL PRIMER
FIGURE A4 | The SOAP protocol. Interaction between the service consumer and the service provider (top). Elements of the SOAP message over HTTP (bottom).

B.3. MAIN FUNCTIONALITY OF THE WEB-SERVICE
The main operations and functionality of the SOAP binding are summarized in Tables A1 and A2. SOAP is a protocol for exchanging XML-based messages over computer networks, normally using HTTP/HTTPS. SOAP once stood for "Simple Object Access Protocol" but this acronym was dropped with Version 1.2 of the standard, as it was considered to be misleading.
The primitive data types are defined by the XMLSchema xsd and xsi name-spaces, while the web-service components are defined by the Web-Service Description Language WSDL name-space (Christensen et al., 2001a), which describes the public operations the web-service clients can execute (see Figure A4). The SOAP protocol is defined by 3 name-spaces: xmlns:soap, xmlns:SOAP-ENV, and xmlns:SOAP-ENC (Table A1). SOAP forms the foundation layer of the web-services providing a basic messaging framework, upon which the application layers was built. The ontology names-space is denoted as IJMes. It is an expansion of the SOAP and WSDL name spaces.
SOAP forms the foundation layer of the web-services protocol stack providing a basic messaging framework upon which abstract layers can be built. An example of the SOAP message exchange is provided in Listings 1 and 2. The protocol is based on asynchronous exchange of messages. The key features of the SOAP messages are that (i) a SOAP message is a valid XML document, (ii) a SOAP message must use the SOAP Envelope name-space and (iii) a SOAP message must use the SOAP Encoding name-space.