Cyber-Workstation for Computational Neuroscience

A Cyber-Workstation (CW) to study in vivo, real-time interactions between computational models and large-scale brain subsystems during behavioral experiments has been designed and implemented. The design philosophy seeks to directly link the in vivo neurophysiology laboratory with scalable computing resources to enable more sophisticated computational neuroscience investigation. The architecture designed here allows scientists to develop new models and integrate them with existing models (e.g. recursive least-squares regressor) by specifying appropriate connections in a block-diagram. Then, adaptive middleware transparently implements these user specifications using the full power of remote grid-computing hardware. In effect, the middleware deploys an on-demand and flexible neuroscience research test-bed to provide the neurophysiology laboratory extensive computational power from an outside source. The CW consolidates distributed software and hardware resources to support time-critical and/or resource-demanding computing during data collection from behaving animals. This power and flexibility is important as experimental and theoretical neuroscience evolves based on insights gained from data-intensive experiments, new technologies and engineering methodologies. This paper describes briefly the computational infrastructure and its most relevant components. Each component is discussed within a systematic process of setting up an in vivo, neuroscience experiment. Furthermore, a co-adaptive brain machine interface is implemented on the CW to illustrate how this integrated computational and experimental platform can be used to study systems neurophysiology and learning in a behavior task. We believe this implementation is also the first remote execution and adaptation of a brain-machine interface.


INTRODUCTION
In the last decade, there has been an explosion in the number of models and amount of physiological data that can be obtained from either basic neuroscience or computational neuroscience experiments (Trappenberg, 2002;Purves et al., 2004). The data from these experiments is beginning to be leveraged to address the problem of reverse-engineering the brain (Markram, 2006). This process has been identifi ed by the U.S. National Academy of Engineering as one of the 21st Century's great challenges for engineering and neuroscience because it has the potential to: (1) impact human health by providing a guide for repairing diseased neural tissue and (2) create innovation in neuromorphic systems based on details of brain function. The challenge arises due to both anatomical and behavioral complexity. Human brains contain approximately 10 15 synapses connecting 10 11 neurons of different classes which are hierarchically self-organized across multiple temporal and spatial scales (Abeles, 1991). Furthermore, the neuronal properties, structural networks, and functional networks can adapt based on experience and learning.
A prominent project addressing anatomical complexity is the Blue Brain Project which combines realistic neuron models (Markram, 2006). Currently, experimental data is fed to a computations (Wilson and Williams, 2009). Additionally, the ability for researchers to share (and improve) models has proven successful in projects like BCI2000 (Schalk et al., 2004) and Physiome (Asai et al., 2008). The Concierge platform at RIKEN provides centralized experimental results and analysis for researchers around the world (Sakai et al., 2007). Here we aim to incorporate aspects of all those systems into a modular, user-friendly workstation for in vivo experiments. We further believe that information technology has an important role to play in encapsulating many details of the laboratory work -translating advanced computational neuroscience methods into accessible tools for experimental neuroscience. The Cyber-Workstation (CW) developed here was designed with those philosophies in mind.
The CW provides a computational infrastructure for real-time neural systems modeling and collection (and storage) of multiple channels of neural and environmental data during behavior. Synchronous modeling and experimentation can provide a window to functional aspects of neural systems that combine sensory, cognitive, and motor processing while interacting with dynamic environments. The CW allows a neuroscientist to expand in vivo paradigms by providing as much computational power as needed.
The CW was developed around and will be demonstrated via a Co-Adaptive Brain-Machine Interface (CABMI). Brain-Machine Interfaces (BMIs) generally create a model of a specifi c brain subsystem's (e.g. motor, parietal) interaction with the external world  such that the BMI can decode user intention to control a prosthetic. Co-adaptive BMIs go a step further by engaging both the user and a computer model to learn to control a prosthetic's movements based on interaction with the environment (DiGiovanna et al., 2009). The CW is an ideal platform for CABMI approaches which naturally require concurrent modeling and experimentation. Although neuroscience encompasses many more topics than BMI; implementing a BMI is a microcosm of the challenges in blending computational and experimental neuroscience.
The next section specifi es system requirements for BMI research and design choices given our available computational resources. The section 'Architecture' describes the system architecture, how the middleware interfaces the software and hardware, and the webportal user interface. The section 'Evaluating the CyberWorkstation' introduces BMI and overviews the CABMI used to demonstrate some CW capabilities. The fi nal section discusses implications of the CW for computational neuroscience.

FUNCTIONAL SYSTEM REQUIREMENTS
The practical embodiment of our CW consolidated software and hardware resources across three collaborating labs: Neuroprosthetics Research Group (NRG), Advanced Computing and Information Systems (ACIS) Lab, and Computational NeuroEngineering Laboratory (CNEL) at the University of Florida. The CW enables neurophysiology researchers to setup, carry out, monitor and review experiments locally that require powerful remote online and/or offl ine data processing. The initial design of the CW was centered around the development of BMIs where neural activity was used to directly control the position of a robotic (prosthetic) arm. The CW was designed to improve the effi ciency and effi cacy of the typical workfl ow required by closedloop experiments. Specifi cally, several aspects of the workfl ow can be automated, hidden or supported by a cyberinfrastructure that exposes, through a single point of access, only the needed functionality for a neuroscientist to conceptualize, conduct and reason about an experiment. Figure 1 depicts a high-level view of what such an infrastructure might consist of. Neural signals are sampled from behaving animals at NRG and sent across the network for computing. At ACIS (>500 m away from NRG), a variety of computational models process these data on a pool of servers; then results are aggregated and used to control robot movement in real-time at NRG. The architecture of the CW is described in the next section and has general applicability beyond the specifi c environments of the ACIS, NRG and CNEL laboratories.
Under the hood of such interface, middleware would be responsible for supporting: • Real-time operations that scale with biological responses (<100 ms) • Parallel processing capability for many multi-input, multioutput models • Large memory capacity for neurophysiological data streams and model parameterization • Customizable signal processing models • Data warehousing for large volumes of neural signals, experiment parameters, and computational results (e.g. prosthetic movements) • Integrated analysis platform for visualization of modeling and physiological parameters • Simple yet powerful user interfaces for scientists to manage experiments These application requirements arise from over three decades of BMI research (Leuthardt et al., 2006;Hatsopoulos and Donoghue, 2009;Nicolelis and Lebedev, 2009). Prior BMI researchers have exploited distributed local computation (Laubach et al., 2003;Wilson and Williams, 2009) or controlled robots by sending control signals around the world (Fitzsimmons et al., 2009). However, to our knowledge, the CW is the fi rst architecture to meet these requirements through distributed, remote computing.

ARCHITECTURE
The CW architecture was designed to satisfy all of the application requirements in the prior section. Researchers may combine existing models (e.g. a toolbox) and/or develop novel models via object oriented programming constructs. These models are connected in a block diagram and then adaptive middleware transparently implements the specifi cations using the full power of the remote hardware. In effect, the middleware deploys an on-demand neuroscience research test-bed that consolidates distributed software and hardware resources to support time-critical and resourcedemanding computing.
Multiple interacting and customizable components make it diffi cult to give a single 'blueprint' of the CW. However, the CW is divided into four functional layers in Figure 2; these layers mirror the key stages of the typical workfl ow  of a BMI experiment: • Access to experimental test-bed • Model selection and composition • Virtual resources request and reservation • Run-time management of experiments Each of these four stages is discussed in this section, fi rst in a general context and then as embodied in our CW implementation. Real-time system operation is achieved through the use of low-level communication protocols and interfaces, use of parallel computing to reduce computation times and careful optimization of middleware components in the critical path (closed loop in Figure 1). These and other IT-related design and implementation details will be reported elsewhere (Rattanatamrong et al., 2009).

ACCESS TO EXPERIMENTAL TEST-BED
This functionality enables neuroscientists to gain access to a physical system consisting of the subject and all devices needed to enable and monitor subject-environment interaction. Necessarily, such a system is specifi c to a class of experiments and may change over time (e.g. subject specifi c differences). If a test-bed can be reused among several researchers, ideally individual researchers could avoid the overhead (time and cost) of interfacing and maintaining different components. Instead, it is desirable to allow researchers to specify needed confi gurations without having to directly modify individual components, each of which may have its own idiosyncratic programming procedures.
The CW enables both the reuse of existing system components and the ability to integrate with remote resources during experiments through what we labeled as a physical layer (see Figure 2A). This physical layer is transparent to the CW users but provides an interface with data acquisition hardware and prosthetic devices. This layer ensures that formatting conventions are respected and network communication between the neurophysiology lab and remote computational resources is stable and reliable. During experimental design, the user must confi gure the physical layer, i.e. specify the manufacturer and model number such that the CW is aware of all hardware it should interact with. However, this confi guration is kept at a block-diagram level. Next we provide an example of basic interfacing in the physical layer and communication to the remote facility.

Data acquisition from neurophysiology hardware
In our physical layer, the CW was confi gured to sample neural signals from a multi-channel, digital signal-processing (DSP) device 1 . The DSP contains buffers that store estimated fi ring rates from electrodes implanted in behaving animals at the NRG laboratory. After acquiring neural activity from the brain, the CW packs data into an input data structure and sends it over the campus network to the ACIS compute cluster. Our implementation sampled 32 electrodes (up to 96 neurons on our DSPs) in 8.1 ms (see "Evaluation" section). Expanding this to 256 neurons (assuming eight detected neurons per electrode) with current Tucker-Davis Technologies components would only double the necessary sampling (due to better compression). The theoretical maximum number of electrodes the CW could support involves a tradeoff between other processes in the computation budget and network transfer speed. We believe this sampling time will scale linearly with the number of electrodes for the range of array sizes used in current BMI studies (Hatsopoulos and Donoghue, 2009).

Prosthetic control
The physical layer also must be confi gured to control the prosthetic. At NRG the prosthetic was a robotic arm (Dynaservo, Markham, ON, Canada) using four degrees of freedom. This robot accepts both incremental and point-to-point types of movement commands. Incremental control specifi es differential changes for each joint of the robot arm, which must fi nish within a closed-loop cycle (∼100 ms). Alternatively, point-to-point control specifi es absolute endpoints but does not guarantee completion time. The CW could seamlessly communicate with the robot in either mode (and switch between modes) to exploit the advantages of each.

Communication to remote facilities
Once this layer is confi gured, it is necessary to communicate with the remote compute cluster. A connectionless User Datagram Protocol (RFC 768 Standard) socket was selected to provide low latency communication between the data collection site (NRG) and processing site (ACIS). This protocol provides lower overhead for real-time operation at the expense of reliability. Data loss and unexpected delay could happen at any time during the communication over the campus network because of its unreliable and shared nature; thus, reliable data transfer must be ensured at the application level via middleware to support reliable and real-time experiments. Middleware at the remote facility creates a timeout if it is unlikely that computation results will be available in time to meet the deadline. A circular fi rst-in-fi rst-out (FIFO) buffer is utilized to store newly acquired data samples while retransmitting the previously failed results. The possible delay between neural activity and robot action is limited by the size of the buffer, which is adjustable based on model requirements and user preference.

MODEL SELECTION AND COMPOSITION
This functionality enables a computational neuroscientist to choose and implement models on the basis of experimental goals. These models may have components involving statistics and machine learning (e.g. adaptive fi lters, neural networks, Hidden Markov models, etc. (Trappenberg, 2002)). Desirably, an investigator should have access to a toolbox of models that could be instantiated easily and composed as necessary to capture the architecture of the neural system under investigation. The toolbox should be easy to extend and share with different researchers. We provided this toolbox in what we labeled an allocation layer (see Figure 2B). It uses a template-based approach where a researcher can specify his or her choices of modules for each part of the template. As long as the modules satisfy pre-specifi ed interface requirements, the components instantiated on the template are guaranteed to communicate and interact correctly. In this layer, the user specifi es data fl ow, model types, experiment priority, and visualization parameters. Middleware infers the amount and type of resources that need to be reserved to execute all the models to meet these specifi cations. Additionally, it checks the software design in the user-created models to ensure there are no logic violations with respect to the expected interfaces. Next we provide more details of basic confi guration in the allocation layer.

Data fl ow
Notice in Figure 2A that the local BMI Controller box, which coordinates data acquisition, signal processing, model adaptation, and prosthetic control, is grayed out. The CW makes that component unnecessary in the neurophysiology lab; instead users specify how neuro-physiological data will be processed (e.g. serially or in parallel) at the remote facilities by creating a block diagram to establish the data fl ow to/from computational models. Figure 2B illustrates possible connections with gray arrows and possible parallel processing (of model sets) with gray planes. This allows user fl exibility and also conveys explicitly what can be executed concurrently in order to exploit the parallel computing capability of remote machines. The data fl ow partially specifi es the overall complexity of the remote processing and can also be used to implicitly identify further parallelism at a fi ner grain. The actual models used in each block (e.g. Model A, B, or C) also contribute to complexity and are discussed next.

Model selection
After specifying the data fl ow, the user must also specify lower-level processing, i.e. which computational models will be used for each block. Here the user has the option to select from a toolbox of established signal processing models (e.g. recursive least squares, moving average). This toolbox is advantageous because the models will have been thoroughly error-checked and vetted by other users. Alternatively, the user can upload custom C++ algorithms for each block.

Visualizations
The user also specifi es which experimental variables will need to be monitored during the actual experiment, i.e. at run-time. This is also done in the block diagram (see Figure 2D). The user inserts and connects a visualization block to an appropriate area to monitor a variable. The user then specifi es which type of visualization (e.g. trace, histogram, error rate) is appropriate.

Experiment priority and type
The fi nal decision is the experiment's priority. The user informs the CW what type of experiment they plan to run, i.e. online, real-time analysis, or offl ine. An online experiment will run at the user's lab and has the highest priority -the CW must meet all user-specifi ed deadlines. A real-time analysis experiment piggybacks onto a collaborator's experiment, e.g. a user at the CNEL lab can run real-time analysis of an online experiment at the NRG lab. An offl ine experiment has the lowest priority because it does not have any hard deadlines. Offl ine experiments are useful to try new signal processing techniques on existing data.

VIRTUAL RESOURCES REQUEST AND RESERVATION
Computing BMI models in real-time often requires investigators to properly structure their models into computational tasks that can be executed concurrently and/or use models that have been previously optimized for parallel execution. These tasks then need to be programmed and deployed on computers that could be dedicated to the test bed or part of a shared computer facility that is usually remote. In the latter case, researchers need to fi gure out the necessary amount of resources, how to request and reserve them, how to connect to these remote resources and how to submit tasks to (and retrieve results from) them. Ideally, investigators should instead only have to do high-level organization that relies on their domain knowledge (i.e. specifi c connectivity and processing in the allocation layer) rather than having to refashion algorithms and making sure that communications and computations take place properly on available hardware, both of which require IT expertise. Depending on the extent of the computational demands, the need for real-time computation and the complexity of accessing remote resources, this experimental stage can be rather complex and a signifi cant barrier to the successful completion of an experiment. We provided this functionality in what we labeled the confi guration layer. This transparent layer includes middleware that organizes and allocates virtual resources to the appropriate physical machines based on the needs specifi ed by the allocation layer. An example of user design choices and middleware reservations is shown using black arrows on the left and right side Figure 2C respectively. The CW uses virtualization technology to create virtual machines (VM) and each VM can be customized with the necessary execution environment, including operating system and libraries, to support seamless deployment of a computational model. Multiple models can run concurrently with dedicated VMs, where resources are dynamically provisioned according to model demands and timing requirements. Virtualization provides effi cient utilization and isolation of resources (e.g. for parallel computation).
Communication between different models is realized using the Message Passing Interface (MPI). MPI allows communication with both simplicity and portability across systems (Snir et al., 1995). The integrity and consistency of data in the CW is protected through the mechanisms available from communication channels. In addition, we isolate data addition, modifi cation and deletion from different users in the CW using a lock mechanism. Further improvement via transaction-based consistency protocol is expected as the CW becomes available to more users.

RUN-TIME MANAGEMENT OF EXPERIMENTS
After physical connection, confi guration, allocation the experiment is ready to run. All data must properly collected and unanticipated situations properly handled and documented. Beyond collecting data, it is desirable to be able to graphically visualize spatio-temporal data, critical measures, and control experimental parameters (if necessary). The CW includes what we label a runtime layer (see Figure 2D) to ensure all specifi cations are met.
At the compute cluster, control scripts are used to orchestrate computation processes and communication among them. The master process (blue server in Figure 2D) manages and distributes data to other computers (worker processes) in the cluster (black servers). The worker processes execute the user's control scheme. The control scheme comprises both computation of commands from neural signal and continuous model adaptation. All model set (black planes) results are aggregated at the master process and sent back to NRG for prosthetic control. Model adaptation occurs between neural commands.
Middleware monitors the elapsed time after sending out neural data. The elapsed time includes the segments marked in Figure 2D with green (outgoing) and red (incoming) arrows. A timeout happens when it detects that it is unlikely to get the computation results back (following the red path) in time to meet the deadline (e.g. 100 ms). In a timeout, the middleware stores the failed data sample in a circular fi rst-in-fi rst-out buffer, and starts a new closed-loop cycle by polling the neurophysiology hardware. During the new cycle, it queues the newly acquired sample in the buffer and retries the transmission of the previously failed sample. A delay between neural activity and robot action will occur after this type of timeout. However, the acceptable extent of this delay is adjustable based on model requirements and user preference.

WEB PORTAL
The prior sections provided an abstract overview of the CW's functional architecture. Here we describe the Web-based portal 2 where users actually interact with the CW. Through this portal, model API, into the CW's collection via the integration portlet. This portlet requires an XML (Sperberg-McQueen et al., 2008) model specifi cation fi le ( containing all information regarding the model's confi gurable parameters), sample input fi les, and default parameter values; this information enables the experiment manager portlet to automatically generate a model confi guration interface for users.
A conceptual overview of visualization portlet is given in Figure 4, it is a window to the CW's run-time layer. The visualization portlet supports the monitoring of prosthetic and model behavior via real-time video streaming. When the user terminates the experiment (as scheduled or due to an emergency), they will not lose any results. All details of the experiment can be downloaded as one archived fi le and examined locally after the experiment is ended. Also, users can perform extensive, remote analysis immediately with the tools (e.g. Matlab) provided in the CW. Offl ine and analysis experiments are confi gured similarly (Experiment types detailed in the section 'Experiment Priority and Type'). If the user would like to adjust the model confi guration to study alternative parameters or to improve performance, they can fi netune dynamic model parameters during the experiment in the experiment management portlet.

EVALUATING THE CYBERWORKSTATION
The CW has been designed to meet seven specifi cations listed in the section 'Functional System Requirements' . In the prior sections, we have developed an architecture that is powerful, fl exible, and userfriendly. Here, we use an online BMI experiment to benchmark the users can access the CW from anywhere with an Internet connection. Portal functionalities are provided via AJAX-based JSR-168 compliant portlets, which allow fl exible interface customization and responsive, asynchronous content update. Users can confi gure the portal environment to meet their needs. The main portlets developed were model integration, experiment management, and visualization. Figure 3 illustrates how a user would interact with the web portal to confi gure the allocation layer (see Model Selection and Composition). After making all physical connections with the CW (e.g. physical layer), the user should be able to interface with all other CW layers through the Web portal. All input, output, and confi guration data of this experiment are persistently stored in the compute cluster, so users can retrieve data or replay the experiment later. The experiment management portlet in the Web portal handles the selection of motor control model sets and model connectivity (i.e. allocation layer). It also ensures that any custom models follow the CW's model API. Model confi guration (e.g. static and dynamic parameters, initializations, etc.) and visualization planning will also be performed in this portlet. Additionally, users can fi ne-tune dynamic model parameters on the fl y. Beyond the collection of models already provided in the CW, users can add custom models, developed by the CW's Cyber-workstation for computational neuroscience TD learning is a boot-strapping method using future value estimates to improve current estimates. TD learning is sensitive to the accuracy of Q because it assumes Q(s t + 1 , a t + 1 ) is a good approximation of R t + 1 ; hence, it may become unstable. However, TD methods asymptotically converge to Q* (Sutton and Barto, 1998

RL-based BMI (RLBMI)
We model the interaction of a paralyzed patient (i.e. user) with an intelligent prosthetic controller (agent) to maneuver a robotic arm to a target lever and press it (similar to reach and grasp) as a cooperative RL task (DiGiovanna et al., 2009). A RL framework is useful for a BMI because it creates an agent that constantly learns from interacting with the world without requiring a 'desired' (training) signal (e.g. user movements). We assume relevant control features (e.g. robot position, goal) will be detected by the user and affect the user's neuronal modulations. To achieve control, the agent must learn both how to detect relevant neuronal states and, given the state, evaluate each possible control action such that it can complete tasks (i.e. earn reward). Briefl y, rats had to use a BMI to maneuver a robot to one of two possible targets (randomly selected). Rats were given cues (e.g. lights, target position, robot position) for the task. In brain control, the rat's objective was to modulate neurons in primary motor cortex to create a neural state. The agent's objective was to perform value function estimation on this neural state and select a robot control action every 100 ms. If the agent and rat successfully maneuvered (selected a sequence of actions) the robot endpoint proximal to a target within 4.3 s, both were given a reward and the trial ended successfully. Otherwise, the robot was reset to original position and neither entity was rewarded. Important RL features: • States: ensemble of neuronal fi ring rates • Actions: robot endpoint motions (set of 26) • Rewards: +1 proximal to correct target, else −0.01

Value function estimation
Neuronal state detection and value function estimation in the above paradigm are both non-trivial tasks for the agent. The state was (average) 60 dimensional (one fi ring rate or history per dimension) which is intractable for familiar look-up table approaches (Sutton and Barto, 1998). Instead, an adaptive projection transforms the state into a subspace where segmentation is performed (DiGiovanna et al., 2007). Specifi cally, a Multi-Layer Perceptron (MLP) both segments the state (in the hidden layer) and estimates the value Q (in the output layer) as: Q is a function of state and weights (w) with i, j, and k indexing the state, hidden layer, and output layers respectively. An εgreedy policy typically selects the action corresponding to the maximum Q k . CW's effectiveness in meeting each requirement. BMI architectures and learning tasks are briefl y overviewed for context. Then we demonstrate CW performance for each of the seven specifi cations.

BRAIN-MACHINE INTERFACES
Conceptually, a BMI learns an unknown mapping between neural signals and some prosthetic output (e.g. cursor position, endpoint velocity) to decode user 'intent' . This mapping is typically initially random but is adapted over time to minimize some error metric (e.g. misalignment, failed trials). There are many BMI implementations (e.g. linear regressors, population vectors, kernel methods, point process models); each is tailored towards a particular application (Schwartz et al., 2006;Sanchez and Principe, 2007;Hatsopoulos and Donoghue, 2009). The CW's fl exible architecture supports a variety of BMI systems whether they use single models [e.g. Recursive Least Squares (RLS)] or combinations of models (e.g. mixtures of experts). We developed a BMI based on Reinforcement Learning (RL) (DiGiovanna et al., 2009); in the next subsections we describe RL, RL-based BMI, and the learning tasks. This BMI specifi cally used real-time processing, limited parallel computation, 'large' memory capacity, and data warehousing capabilities of the CW. Later, we also address how other CW abilities could be exploited.

Reinforcement learning
The RL paradigm involves an agent that learns to interact with an environment in a fashion that maximizes future rewards (Sutton and Barto, 1998). The agent environment interface is modeled as a Markov Decision Process. Agent actions infl uence environmental state and after completing an action, the environment may provide a reward. The agent tries to maximize return R t which is simply the discounted (γ≤1) sum of rewards r n for all future times n. Equation 1 shows that return is conditional on current state-action pairs. The agent does not know whether selected actions were optimal as they are executed. However, over time it can build an estimate of return -a value function Q. These value functions (Eq. 2) are recursively consistent such that Eq. 2 can be rearranged into Eq. 3 (please refer to Sutton and Barto, 1998 for derivation). Given Eq. 3 the agent can start with an initially random Q and learn the value function from observations of states, actions, and rewards. If the agents learns the optimal value function Q*, then it can always maximize reward by taking actions which maximize Q*. RL can provide an unbiased estimator of the Bellman Optimality Equation (4) (Sutton and Barto, 1998) which can be solved if the state transition P ss a ′ and reward probabilities R ss a are known and stationary. In situations where these probabilities are unknown and/or non-stationary (e.g. a BMI), RL methods can adapt the value function (Eq. 3) towards the Q* based on temporal difference (TD) error (Eq. 5) (TD error is created by rearranging the terms in Eq. 3). Positive error indicates things are better than expected and Q(s, a) should be increased, negative error indicates the opposite.
There are two computational tasks created here, network initialization and adaptation. Initialization refers to rapid adaptation of initially random MLP weights to estimate Q based on a sequence of observed states, actions, and rewards. Here the mapping is nonlinear and contains many local minima. To increase the probability of converging to a global minimum, multiple networks were created and trained. Networks were adapted by back-propagation of TD(λ) error (similar to TD error described above) as described in (Sutton and Barto, 1998). Each network was trained for up to 1000 times (epochs). We initially did this offl ine due to computational limitations (speed and memory) in our local workstation. Initialization includes many adaptations; however, real-time adaptation is challenge that cannot be addressed offl ine. After each action (every step) the agent has a new state, action, and reward observation; hence new error. Q(λ) learning can require up to 42 complete weight updates [see Real-Time Operation that Scale with Biological Responses (<100 ms)] per time-step in this task. This must be completed within 100 ms for adaptation and as soon as possible for initialization.

EXPERIMENTAL SETUP
A series of both online and simulated RLBMI experiments was used to benchmark CW performance. A 100-ms deadline was imposed on each closed-loop control cycle, which consists of four phases: data acquisition, network transfer, model computation, and robot control (see Figure 2D). Neural signals (32 channels) were sampled through a DSP device (online) or from offl ine data stores (simulated). The data acquisition and robot control server (local) has dual 2.4 GHz Xeon processors and runs Windows Server 2003. Remote computation was conducted on a cluster of VMware ESX server 3.0-based VMs hosted on several dual 3.2 GHz Xeon servers. Each VM has 1 GB RAM and runs Ubuntu Linux 7.04. The experiment was submitted, controlled, and monitored locally via the Web portal.

Real-time operation that scale with biological responses (<100 ms)
In an online RLBMI experiment with 8151 time steps (13 m 22 s), 99% of closed-loop control cycles were completed in less than 10 ms; 100% completed within 100 ms (Zhao et al., 2008). This demonstrates the CW provides a high-performance computing environment capable of real-time experiments including BMI adaptation. Table 1 reports timing results in more detail, including the statistics of the entire cycle time as well as individual phase latency (see Figure 2D for labels).
The data acquisition phase is responsible for a majority of overall variability in closed-loop operation time. Aquiring neural ensemble fi ring rates from neurophysiology hardware requires 256 bytes (8 bytes per electrode), which is trivial to transfer. However, problems arise because Windows Server 2003 is not a real-time operating system. Although we assign both real-time priority and stop unnecessary services, the Windows Scheduler did not always allow acquisition (or robot control) to start immediately.
The relatively large variance in model computation is related to the specifi c algorithm used in this example. In Q(λ) learning, error is back propagated through the MLP at each time step (similar to other neural networks). However, this error includes an approximation of return (Eq. 1), which is not completely available at time t + 1 (details in Sutton and Barto, 1998). Instead the MLP was adapted multiple times; increasingly often as trial length increased. For example, a four-step trial has an average of 1.5 weight updates per time step [(3 + 2 + 1)/4]. If the rat used the maximum trial length (43 steps) then there were 21 average weight updates per time step. Since the rat is free to maneuver the robot along any path (and trials may not be successful), trial length and computation complexity were variable.

Parallel processing capability for multi-input, multi-output models
In addition to closed-loop (online) mode, the CW excels in offl ine BMI training. A modest example is shown in Figure 5 for the real world problem of RLBMI initialization (also common to other BMI). Specifi cally, value function estimation (see Value Function Estimation) is not guaranteed to converge to a global optimum, so multiple MLPs (planes in Figure 2) are initialized and trained to fi nd the best performer. The CW allocates VMs to initialize and train each MLP. The CW outperforms local computation if at least 2 VM (each training 18 MLPs) are used. If 18 VM (each training 2 MLPs) are used, initialization time is reduced by nearly a factor of 5. This creates the opportunity to fully initialize (or re-initialize) the BMI with the CW between

CyberWorkstation Local Processing
FIGURE 5 | Cyberworkstation performance as a function of number of virtual machines used. The time required (error bars are standard deviation in 10 trials) to initialize a RLBMI using rat data from (DiGiovanna et al., 2009). The CW is faster (mean) than the local computer (AMD Turion 64 X2 [1.8 GHz dual core], 4 Gb RAM) when more than one VM is used (single VM time was 4883 ± 33 s). Additionally, the local computer is free for other processing tasks. normal trials 3 . This contrasts sharply with our prior solution of offl ine batch training in (DiGiovanna et al., 2009) or the common practice of disjoint train and testing phases (Hatsopoulos and Donoghue, 2009). The CW provides the necessary computational performance in experiments where scalability and code reusability are critical. A remote, resource-rich computing center makes it possible to provision the CW with large numbers of virtual/physical machines. This removes resource impediments to the scaling of models and data sets leaving only algorithm parallelization constraints. Two interesting aspects of Figure 5 are that the CW is slower than local processing when a single VM is used and computation time increases from 18 to 36 VMs. Overheads in virtualization, communication, resource sharing, parallelization and middleware execution that are not present in a local computer cause the fi rst issue. Using only a single remote VM creates this overhead without any of the benefi ts; hence, the local workstation is faster. Using additional VM creates speed increases up to a certain point (here >18) when the number of VMs becomes suffi ciently large such that the communication and sharing behavior (all VMs write to the same set of fi les) between VMs exceed the computational savings gained from parallel computation of RLBMI networks.

Large memory for neurophysiological data streams and model parameterization
Brain-machine interfaces can present variable challenges to a cyber workstation depending which models (or model sets) are being employed. For example, a linear regression from neural signals to hand position using least mean squares (common in BMI) has complexity O(N) (where N is the number of neural inputs) for each output dimension, D. If a designer wanted to use Gaussian Processes to transform the neural data before fi ltering, they could add another model but this would have complexity O(M 3 ) where M is the number of training samples. Neurophysiological data often is repeatedly processed in a BMI; hence it may require large, dynamically allocated memory blocks that must scale with N and time of control. Admittedly, RLBMI is parsimonious in parameters relative to other BMI; we expect CW strengths will be better showcased by other BMI -some of which scale exponentially with N or D.

Customizable signal processing models
The aspect of the CW with the greatest research potential is its inherent fl exibility. Both data fl ow and signal processing are totally customizable, e.g. in Figure 2B the CW user is presented with a fully interconnected set of three processing models between the input and output. Figure 2C shows how the user could confi gure the data fl ow. Specifi cally, the input goes directly to Model A. Model A has reciprocal connections with B and an outgoing connection to C. Model C has outgoing connections to B and the BMI output. The remaining gray arrows were not utilized. Additionally, the user has selected multiple parallel BMI implementations that may or may not have the same data fl ow as described above (the user can customize each parallel implementation). The models (i.e. A, B, C) in Figures 2B,C are customizable. Users can either select models from an established model toolbox (e.g. least-squares, RLS) or create their own models in C++ and upload them to the CW.
Model topology and confi guration will depend on the particular task attempted. One interesting combination that could be applied to a state-based controller (e.g. RLBMI) involves Gaussian Process models (Deisenroth et al., 2009) to segment neural state, linear fi lters to calculate state value, state/reward transition models to estimate the future, models that aggregate learning from experience and prediction, and fi nally inverse kinematics optimization models to fi nd robot actuations that achieve selected actions. It may also be useful to have a mixture of experts, which is parallel implementation of some of these models such that there is an expert (e.g. value function estimator) for different areas of the state space. We are actively pursuing this topic using the CW.

Integrated analysis platform for visualization of modeling and physiological parameters
The CW also facilitates integrated analysis both for the local (neurophysiology) lab and collaborators elsewhere. A conceptual overview of this integration is shown in Figure 6 where local CW users are running a standard experiment with a single visualization. Additionally, collaborators are simultaneously analyzing (processing represented by dashed lines) the same experiment. The remote lab can perform additional analysis and communicate with the local lab through an instant messaging client. This is advantageous for the 3 For typical experiments, the minimum (average) inter-trial time was 1:06 [m:ss] (2:12). Using 18 VMs (smaller network than Figure 5; local computation time was 15:33), the entire initialization time was 1:26. Given data, we could theoretically train the networks quickly enough that the user may not notice the delay before they were able to control the prosthetic. It is often the case that data from BMI or general neuroscience experiments are used for follow-up studies which require access to the experimental data according to specifi c formats and execution of processing tasks to analyze, mine, or identify patterns of interest in the data. One hour of experiments can easily generate gigabytes of data. The CW securely preserves this data in persistent storage, yet keeps it easily available for online processing and visualization. Additionally, analysis code can be centrally stored to ensure reported results are reproducible.
The CW provides the necessary resources for future studies including models to estimate future neural states, environmental rewards, and user's internal reward ). This modifi cation would allow a BMI to learn from both experience (DiGiovanna et al., 2009) and model prediction of possible environmental interactions; thus facilitating faster learning. Additionally, fi nding explicit functional relationships between brain states, prosthetic actions, and rewards could provide insight into how the different brain areas process and share information. Even biologically reduced models (reductionist approach) can be analyzed to reveal features (e.g. parameter sensitivity, hidden states) which correspond to neural or behavioral decision variables (Corrado and Doya, 2007).
The capability to include extra models is also useful for prosthetic control, specifi cally for fi nding differential commands for each robot joint based on predicted endpoint positions. Determining appropriate commands requires inverse kinematics, but this calculation for a redundant four degree-of-freedom robot (non-unique endpoint to joint angles mapping) does not have a closed form solution (Craig, 1989). Joint angles can be found via optimization, but it is not guaranteed to fi nd a feasible solution. Error checking and repeated optimizations consume the computing budget. Parallel computation increases the probability of at least one feasible solution in minimal time.
The CW does face a number of possible limitations. The major impediment is that researchers have developed their own models and are not able to devote resources to convert them to C++ compatible with the CW's API. This could limit both the number of available models in the toolbox and number of CW users (to refi ne said models). Future CW development may expand support for models written in other languages (e.g. Matlab, Python); however, CW adoption may be too time consuming for some labs. Modeling fl exibility could be helpful for users who do use the CW; however, middleware is only responsible for checking that users' code includes a set of data structures for communication -there is no check whether the code works. Users will need to validate models on similar processing architectures before uploading them to the CW. Additionally, transmitting neurophysiological data from animals or humans across the Internet could raise privacy and security concerns [e.g. compliance with Health Insurance Portability and Accountability Act (HIPAA) laws]. Finally, real-time communication cannot be guaranteed for arbitrary Internet connections whose communication latencies are excessive.
Overall, this solution for distributed BMIs could lay the ground for scalable middleware techniques that, in the long run, can support increasingly elaborate neurophysiologic research test beds in which subjects can carry out more complex tasks. These advanced local lab because they get feedback and assistance from the remote lab. The remote lab can implement advanced analysis metrics and even alternative models without causing the local lab to divert their own attention from ensuring the BMI user's safety, the prosthetic function, and their own analysis. If the remote lab develops a model that exhibits better performance (prosthetic control), then ideally their model could be selected and put online locally. The remote lab also benefi ts because they are given access to a closed-loop experiment without having to invest in access to BMI users and neurophysiology hardware.

Data warehousing
The CW also provides vast data warehousing capability. Before the CW, computational neuroscience experiments at the NRG lab generated 1-2 Gb of data per hour. Storage was scaled via external hard drives; however, drive failures could be catastrophic, switching between users required switching drives, and these drives were not easily incorporated into a data back-up system. The CW simplifi es and centralizes this storage to a Redundant Array of Inexpensive Disks (RAID) high-capacity storage server in the ACIS lab.

Simple user interfaces
As previously described in the section 'Architecture' , the CW was designed with a focus on creating a simple and user-friendly interface to allow BMI developers to focus on research instead of lowlevel technical details. In order to use available models in the CW, the user does not need to know the specifi c details of the models; he or she only needs to understand the concepts. Help is provided via in-line mouse-over tips to be able to work with the models in short amount of time. The user interface also integrates multiple terminals (portlets) while conducting online experiments to give users a unifi ed control interface.

CONCLUSION
We have developed here a new framework to seamlessly bridge computational resources with in vivo experiments to better study the functional aspects of brain systems operating in dynamic environments. Online and offl ine BMI experiments execute in a closed-loop manner that includes in vivo (for online experiments) data acquisition, reliable network transfer, parallel model computation, and real-time robot control. Scientists can conveniently deploy their algorithms and control structures on the cyber-infrastructure and conduct research through its Web portal. We have shown proof of concept that BMI control schemes (specifi cally RLS and RLBMI) could be implemented and tested on the CW. Additionally; we demonstrated that the CW met each requirement on the BMI designer's wish list.
This CW potentially dissolves analysis barriers in neurophysiology laboratories while further systematizing both experimental and computational neurophysiological investigations. The CW achieves this functionality by linking available hardware with user-friendly software architecture via a powerful middleware layer that manages their interaction. This novel integration approach makes the CW powerful and customizable but also hides complexity with easy-to-use interfaces for users to conduct research. This simple interface makes the CW accessible to a variety of users, e.g. scientists, engineers, and clinicians. We hope this accessibility catalyzes collaborative research.