BluePyOpt: Leveraging Open Source Software and Cloud Infrastructure to Optimise Model Parameters in Neuroscience

At many scales in neuroscience, appropriate mathematical models take the form of complex dynamical systems. Parameterizing such models to conform to the multitude of available experimental constraints is a global non-linear optimisation problem with a complex fitness landscape, requiring numerical techniques to find suitable approximate solutions. Stochastic optimisation approaches, such as evolutionary algorithms, have been shown to be effective, but often the setting up of such optimisations and the choice of a specific search algorithm and its parameters is non-trivial, requiring domain-specific expertise. Here we describe BluePyOpt, a Python package targeted at the broad neuroscience community to simplify this task. BluePyOpt is an extensible framework for data-driven model parameter optimisation that wraps and standardizes several existing open-source tools. It simplifies the task of creating and sharing these optimisations, and the associated techniques and knowledge. This is achieved by abstracting the optimisation and evaluation tasks into various reusable and flexible discrete elements according to established best-practices. Further, BluePyOpt provides methods for setting up both small- and large-scale optimisations on a variety of platforms, ranging from laptops to Linux clusters and cloud-based compute infrastructures. The versatility of the BluePyOpt framework is demonstrated by working through three representative neuroscience specific use cases.


INTRODUCTION
Advances in experimental neuroscience are bringing an increasing volume and variety of data, and inspiring the development of larger and more detailed models (Izhikevich and Edelman, 2008;Merolla et al., 2014;Markram et al., 2015;Eliasmith et al., 2016). While experimental constraints are usually available for the emergent behaviors of such models, it is unfortunately commonplace that many model parameters remain inaccessible to experimental techniques. The problem of inferring or searching for model parameters that match model behaviors to experimental constraints constitutes an inverse problem (Tarantola, 2016), for which analytical solutions rarely exist for complex dynamical systems, i.e., most mathematical models in neuroscience. Historically, such parameter searches were done by hand tuning, but the advent of increasingly powerful computing resources has brought automated search algorithms that can find suitable parameters (Bhalla and Bower, 1993;Vanier and Bower, 1999;Achard and De Schutter, 2006;Druckmann et al., 2007;Gurkiewicz and Korngreen, 2007;Van Geit et al., 2007Huys and Paninski, 2009;Taylor et al., 2009;Hay et al., 2011;Bahl et al., 2012;Svensson et al., 2012;Friedrich et al., 2014;Pozzorini et al., 2015;Stefanou et al., 2016). While many varieties of search algorithms have been described and explored in the literature (Vanier and Bower, 1999;Van Geit et al., 2008;Svensson et al., 2012), stochastic optimisation approaches, such as simulated annealing and evolutionary algorithms, have been shown to be particularly effective strategies for such parameter searches (Vanier and Bower, 1999;Druckmann et al., 2007;Gurkiewicz and Korngreen, 2007;Svensson et al., 2012). Nevertheless, picking the right type of stochastic algorithm and setting it up correctly remains a non-trivial task requiring domain-specific expertise, and could be model and constraint specific (Van Geit et al., 2008).
With the aim of bringing widely applicable and state-of-theart automated parameter search algorithms and techniques to the broad neuroscience community, we describe here a Pythonbased open-source optimisation framework, BluePyOpt, which is available on Github (see Blue Brain Project, 2016), and is designed taking into account model optimisation experience accumulated during the Blue Brain Project (Druckmann et al., 2007;Hay et al., 2011;Markram et al., 2015;Ramaswamy et al., 2015) and the ramp-up phase of the Human Brain Project. The general purpose high-level programming language Python was chosen for developing BluePyOpt, so as to contribute to, and also leverage from the growing scientific and neuroscientific software ecosystem (Oliphant, 2007;Muller et al., 2015), including stateof-the-art search algorithm implementations, modeling and data access tools.
Of course BluePyOpt is not the only tool available to perform parameter optimisations in neuroscience (Druckmann et al., 2007;Van Geit et al., 2007;Bahl et al., 2012;Friedrich et al., 2014;Carlson et al., 2015;Pozzorini et al., 2015). Some tools provide a Graphical User Interface (GUI), other tools are written in other languages, or use different types of evaluation functions or search algorithms. We explicitly didn't make a detailed comparison between BluePyOpt and other tools because many of these tools are developed for specific and non-overlapping applications, making a systematic comparison difficult. This suggests perhaps BluePyOpt's greatest strength, its broad applicability relative to previous approaches.
At its core, BluePyOpt is a framework providing a conceptual scaffolding in the form of an object-oriented application programming interface or API for constructing optimisation problems according to established best-practices, while leveraging existing search algorithms and modeling simulators transparently "under the hood." For common optimisation tasks, the user configures the optimisation by writing a short Python script using the BluePyOpt API. For more advanced use cases, the user is free to extend the API for their own needs, potentially contributing these extensions back to the core library. The latter is important for BluePyOpt APIs to remain broadly applicable and state-of-the-art, as best-practices develop for specific problem domains, mirroring the evolution that has occured for neuron model optimisation strategies (Bhalla and Bower, 1993;Hay et al., 2011).
Depending on the complexity of the model to be optimised, BluePyOpt optimisations can require significant computing resources. The systems available to neuroscientists in the community can be very heterogeneous, and it is often difficult for users to set up the required software. BluePyOpt therefore also provides a novel cloud configuration mechanism to automate setting up the required environment on a local machine, cluster system, or cloud service such as Amazon Web Services.
To begin, this technology report provides an overview of the conceptual framework and open-source technologies used by BluePyOpt, followed by a presentation of the software architecture and API of BluePyOpt. Next, three concrete use cases are elaborated in detail, showing how the BluePyOpt APIs, concepts and techniques can be put to use by potential users. The first use case is an introductory example demonstrating the optimisation of a single compartmental neuron model with two Hodgkin-Huxley ion channels. The second use case shows a BluePyOpt-based state-of-the-art optimisation of a morphologically detailed thick-tufted layer 5 pyramidal cell model of the type used in a recent in silico reconstruction of a neocortical microcircuit (Markram et al., 2015). The third use case demonstrates the broad applicability of BluePyOpt, showing how it can also be used to optimise parameters of synaptic plasticity models.

CONCEPTS
The BluePyOpt framework provides a powerful tool to optimise models in the field of neuroscience, by combining several established Python-based open-source software technologies. In particular, BluePyOpt leverages libraries providing optimisation algorithms, parallelization, compute environment setup, and experimental data analysis. For numerical evaluation of neuroscientific models, many open-source simulators with Python bindings are available for the user to choose from (see Section 2.2). The common bridge allowing BluePyOpt to integrate these various softwares is the Python programming language, which has seen considerable uptake and a rapidly growing domain-specific software ecosystem in the neuroscience modeling community in recent years . Python is recognized as a programming language which is fun and easy to learn, yet also attractive to experts, meaning that novice and advanced programmers alike can easily use BluePyOpt, and contribute solutions to neuroscientific optimisation problems back to the community.
BluePyOpt was developed using an object oriented programming model. Figure 1 shows an overview of the class hierarchy of BluePyOpt. In its essence, the BluePyOpt object model defines the Optimisation class which applies a search algorithm to an Evaluator class. Both are abstract classes, meaning they define the object model, but not the implementation. Taking advantage of Pythonic duck typing, the user can then choose from a menu of implementations, derived classes, or easily define their own implementations to meet their specific needs. This design makes BluePyOpt highly versatile, while keeping the API complexity to a minimum. The choice of algorithm and evaluator is up to the user, but many are already provided for various use cases (see Section 2.1). For many common use cases, these are the only classes users are required to instantiate.
For neuron model optimisations in particular, BluePyOpt provides further classes to support feature-based multi-objective optimisations using NEURON, as shown in Figure 1. Classes Model, Morphology, Mechanisms, Protocol, Stimuli, Recordings, Location are specific to setting up neuron models and assessing their input-output properties. In the examples in this paper we focus on single cell models by using the CellModel class. There is also the possibility to define classes like e.g., NetworkModel or SynapseModel to optimise other types of models. Other classes Objectives and eFeature are more generally applicable, with derived classes for specific use cases, e.g., eFELFeature provides features extracted from voltage traces using the open-source eFEL library discussed below. They define features and objectives for feature-based multi-objective optimisation, a stochastic optimisation strategy (Druckmann et al., 2007(Druckmann et al., , 2011Hay et al., 2011). We generally recommend it as the first algorithm to try for a given problem domain. For example, the third example for the optimisation of synaptic plasticity models also employs this strategy.
In the sub-sections to follow, an overview is provided for the various software components and the manner in which BluePyOpt integrates them.

Optimisation Algorithms
Multiobjective evolutionary algorithms have been shown to perform well to optimise parameters of biophysically detailed multicompartmental neuron models (Druckmann et al., 2007;Hay et al., 2011). To provide optimisation algorithms, BluePyOpt relies on a mature Python library, Distributed Evolutionary Algorithms in Python (DEAP), which implements a range of such algorithms (Fortin et al., 2012). The advantage of using this library is that it provides many useful features out of the box, and it is mature, actively maintained and well documented. DEAP provides many popular algorithms, such as Non-dominated Sorting Genetic Algorithm-II (Deb et al., 2002), Covariance Matrix Adaptation Evolution Strategy (Hansen and Ostermeier, 2001), and Particle Swarm Optimisation (Kenny and Eberhart, 1995). Moreover, due to its extensible design, implementing new search algorithms in DEAP is straight-forward. Historically, the Blue Brain Project has used a C implementation of the Indicator Based Evolutionary Algorithm IBEA to optimise the parameters of biophysically detailed neuron models (Bleuler et al., 2003;Zitzler and Künzli, 2004;Markram et al., 2015), as this has been shown to have excellent convergence properties for these problems (Schmücker, 2010). Case in point, we implemented a version of IBEA for the DEAP framework, so this algorithm is consequently available to be used in BluePyOpt.
Moreover, DEAP is highly versatile, whereby most central members of its class hierarchy, such as individuals and operators, are fully customizable with user defined implementations. Classes are provided to keep track of the Pareto Front or the Hall-of-Fame of individuals during evolution. Population statistics can be recorded in a logbook, and the genealogy between individuals can be saved, analyzed and visualized. In addition, checkpointing can be implemented in DEAP by storing the algorithm's state in a Python pickle file for any generation, as described in DEAP's documentation (DEAP Project, 2016).
Although the use cases below use DEAP as a library to implement the search algorithm, it is worth noting that BluePyOpt abstracts the concept of a search algorithm. As such, it is entirely possible to implement algorithms that are independent of DEAP, or that use other third-party libraries.

Simulators
To define a BluePyOpt optimisation, the user must provide an evaluation function which maps model parameters to a fitness score. It can be a single Python function that maps the parameters to objectives by solving a set of equations, or a function that uses an external simulator to evaluate a complex model under multiple scenarios. For the latter, the only requirement BluePyOpt imposes is that it can interact with the external simulator from within Python. Often, this interaction is implemented through Python modules provided by the user's neuroscientific simulator of choice, as is the case for many simulators in common use, including NEURON (Hines et al., 2009), NEST Zaytsev and Morrison, 2014), PyNN , BRIAN (Goodman and Brette, 2009), STEPS (Wils and De Schutter, 2009), and MOOSE (Ray and Bhalla, 2008). Otherwise, communication through shell commands and input/output files is also possible, so long as an interface can be provided as a Python class.

Feature Extraction
For an evaluation function to compute a fitness score from simulator output, the resulting traces must be compared against experimental constraints. Voltage recordings obtained from patch clamp experiments are an example of experimental data that can be used as a constraint for neuron models. From such recordings the neuroscientist can deduce many interesting values, like the input resistance of the neuron, the action potential characteristics, firing frequency etc. To standardize the way these values are measured, the Blue Brain Project has released the Electrophysiology Feature Extract Library (eFEL) (Blue Brain Project, 2015), also as open-source software. The core of this library is written in C++, and a Python wrapper is provided.
BluePyOpt can interact with eFEL to compute a variety of features of the voltage response of neuron models. A fitness score can then be computed by some distance metric comparing the resulting model features to their experimental counterparts. As we will see for the last example in this article, a similar approach can also be taken for other optimisation problem domains.

Parallelization
Optimisations of the parameters of an evaluation function typically require the execution of this function repeatedly. For a given iteration of the optimisation, such executions are often in the hundreds (scaling e.g., with evolutionary algorithm population size), are compute bound, and are essentially independent, making them ripe for parallelization. Parallelization of the optimisation can be performed in several ways. DEAP provides an easy way to evaluate individuals in a population on several cores in parallel. The user need merely provide an implementation of a map function. In its simplest form, this function can be the Python serial map in the standard library, or the parallel map function in the multiprocessing module to leverage local hardware threads. To parallelize over a large cluster machine, the DEAP developers encourage the use of the SCOOP (Hold-Geoffroy et al., 2014) map function. SCOOP is a library that builds on top of ZeroMQ (ZeroMQ Project, 2007), which provides a socket communication layer to distribute the computation over several computers. Other map functions and technologies can be used like MPI4Py (Dalcín et al., 2005) or iPython ipyparallel package (Pérez and Granger, 2007). Moreover, parallelization does not necessarily have to happen at the population level. Inside the evaluation of individuals, map functions can also be used to parallelize over stimulus protocols, feature types, etc., however for the problem examples presented here, such an approach would not make good use of anything more than 10 to 20 cores.

Cloud
To increase the throughput of optimisations, multiple computers can be used to parallelize the work. Such a group of computers can be composed of machines in a cluster, or they can be obtained from a cloud provider like Amazon Web Services, Rackspace Public Cloud, Microsoft Azure, Google Compute Engine, or the Neuroscience Gateway portal (Sivagnanam et al., 2013).
These and other cloud providers allow for precise allocation of numbers of machines and their storage, compute power and memory. Depending on the needs and resources of an individual or organization, trade-offs can be made on how much to spend vs. how fast the results are needed.
Setting up a cluster or cloud environment with the correct software requirements is often complicated and error prone: Each environment has to be exactly the same, and scripts and data need to be available in the same locations. To ease the burden of this configuration, BluePyOpt includes Ansible (Red Hat, Inc., 2012) configuration scripts for setting up a test environment on one local computer (using Vagrant HashiCorp, 2010), for setting up a cluster with a shared file system, or for provisioning and setting up an Amazon Web Services cluster.
Ansible is open-source software that allows for reproducible environments to be created and configured from simple textual descriptions called "Playbooks." These Playbooks encapsulate the discrete steps needed to create an environment, and offer extra tools to simplify things like package management, user creation and key distribution. Furthermore, when a Playbook is changed and run against an already existing environment, only the changes necessary will be applied. Finally, Ansible has the advantage over other systems, like Puppet Labs (2005) and Chef (2009), that nothing except a Python interpreter needs to be installed on the target machine and all environment discovery and configuration is performed through SSH from the machine on which Ansible is run. This decentralized system means that a user can use Ansible to setup an environment in their home directory on a cluster, without intervention from the system administrators.

SOFTWARE ARCHITECTURE
The BluePyOpt software architecture follows an object oriented programming model, whereby the various concepts of the software are modularized into cleanly separated and well defined classes which interact as shown in diagrams of the class hierarchy (Figure 1), object model ( Figure 2) and program control flow (Figure 3). In what follows, the role of each class and how it relates to and interacts with other classes in the hierarchy is described.

Optimisation Abstraction Layer
At the highest level of abstraction, the BluePyOpt API contains the classes Optimisation and Evaluator (Figure 2). An Evaluator object defines an evaluation function that maps Parameters to Objectives. The Optimisation object accepts the Evaluator as input, and runs a search algorithm on it to find the parameter values that generate the best objectives. At the moment, the main optimisation subclass is IBEADEAPOptimisation, but this will be extended in the future with other classes for specific search algorithms.
The task of the search algorithm is to find the parameter values that correspond to the best objective values. Defining "best" is left to the specific implementation, but the optimisation algorithm and evaluator should follow consistent conventions. As in the use cases below, the goal of the algorithm could be minimizing a weighted sum of the objectives or a multiobjective front in a multidimensional space.
The Optimisation class allows the user to control the settings of the search algorithm. In case of IBEA, this could be the number of individuals in the population, the mutation probabilities, etc.
Multiple Objectives can be accumulated using e.g., MaxObjective or WeightedSumObjective objects before being passed on to the search algorithm. When running multiobjective optimisations, the algorithm will treat the objectives as separate dimensions, and will not provide a single score to every individual. At the end of an optimisation, the hall of fame ranking is based on the sum of the objectives. Depending on the preferences of the user, this can easily be extended to  include other combinations like root mean square, weighted sum, etc.

EPhys Model Abstraction Layer
On a different level of abstraction, we have classes that are tailored for electrophysiology (ephys) experiments and can be used inside the Evaluator. The ephys model layer provides an abstraction of the simulator, so that the person performing the optimisation is not required to have knowledge of the intricate details of the simulator. This layer is provided as a convenience, and is entirely optional. Users can choose to implement an electrophysiological model in any way they want, as long as they construct their own Evaluator.
A Protocol is applied to a Model for a certain set of Parameters, generating a Response. An ObjectivesCalculator is then used to calculate the Objectives based on the Response of the Model. All these classes are part of the bluepyopt.ephys package.

Model
By making a Model an abstract class, we give users the ability to use our software for a broad range of use cases. A Protocol can attach Stimuli and Recordings to a Model. When the Simulator is then run, a Response is generated for each of the Stimuli for a given set of Model parameter values.
Examples of broad subclasses are a NetworkModel, CellModel and SynapseModel. Specific subclasses can be made for different simulators, or assuming some level of similarity, the same model object can know how to instantiate itself in different simulators. In the future, functionality could be added to import/export the model configuration from/to standard description languages like NeuroML (Cannon et al., 2014) or NineML (Raikov et al., 2011). Particular parameters of a Model can be in a frozen state. This means that their value is fixed, and will not be considered for optimisation. This concept can be useful in multi-stage optimisation in which subgroups of a model are optimised in a sequential fashion.
Another advantage of this abstraction is that a Model is a standalone entity that can be run outside of the Optimisation and have exactly the same Protocols applied to it, generating exactly the same Response. One can also apply extra Protocols to assess how well the model generalizes, or to perfom a sensitivity analysis.

Simulator
Every model simulator should have a subclass of Simulator. Objects of this type will be passed on to objects that are simulator aware, like the Model and Stimuli when their instantiate method is invoked. This architecture allows e.g., the same model object to be run in different simulators. Examples of functionality this class can provide are links to the Python module related to the simulator, the enabling of variable time step integration, etc. Simulators also have run() method that starts the simulation.

Protocol
A Protocol is an object that elicits a Response from a model. In its simplest form it represents, for example, a step current clamp stimulus, but more complicated versions are possible, such as stimulating a set of cells in a network with an elaborate protocol and recording the response. A Protocol can also contain sub-protocols, providing a powerful mechanism to reuse components.

Stimulus, Recording and Response
The Stimulus and Recording objects, which are part of a Protocol are applied to a model and are aware of the simulator used. Subclasses of Stimulus are concepts like current/voltage clamp, synaptic activation, etc. Both of these classes accept a Location specifier. Several Recording objects can be combined in a Response which can be analyzed by an ObjectiveCalculator. At the moment the Recording and Response objects store all their data in memory, but if the need arises, the underlying implementation could also use data models that write to disk.

Location
Specifying the location on a neuron morphology of a recording, stimulus or parameter in a simulator can be complicated. Therefore we created an abstract class Location. As arguments the constructor accepts the location specification, e.g., in NEURON this could be a sectionlist name and an index of the section, or it could point to a section at a certain distance from the soma. Upon request, the object will return a reference to the object at the specified location, this could e.g., be a NEURON section or compartment. At a location, a variable can be set or recorded by a Parameter or Recording, respectively.

ObjectivesCalculator, eFeature
The ObjectivesCalculator takes the Response of a Model and calculates the objective values from it. When using ephys recordings, one can use the eFEL library to extract eFeatures. Examples of these eFeatures are spike amplitudes, steady state voltages, etc. The values of these eFeatures can then be compared with values from experimental data, and a score can be calculated based on the difference between model and experiment (for an example of such a score, see Section 4.2.3. Features are not limited to voltage traces or the eFEL library, but can also be computed on concentrations or any other variables which can be recorded using the Recording class during the experiment.

EXAMPLE USE CASES
To provide hands on experience how real-world optimisations can be developed using the BluePyOpt API, this section provides step-by-step guides for three use-cases. The first is a single compartmental neuron model optimisation, the second is an optimisation of a state-of-the-art morphologically detailed neuron model, and the third is an optimisation of a synaptic plasticity model. All examples to follow assume NEURON default units, i.e., ms, mV, nA, µF cm −2 , etc. (Carnevale and Hines, 2008).

Single Compartmental Model
The first use case shows how to set up an optimisation of single compartmental neuron model with two free parameters: The maximal conductances of the sodium and potassium Hodgkin-Huxley ion channels. This example serves as an introduction for the user to the programming concepts in BluePyOpt. It uses the NEURON simulator as backend.
First we need to import the top-level bluepyopt module. This example will also use BluePyOpt's electrophysiology features, so we also need to import the bluepyopt.ephys subpackage.
import bluepyopt as bpop import bluepyopt.ephys as ephys Next we load a morphology from a file. By default a morphology in NEURON has the following sectionlists: somatic, axonal, apical and basal. We create a Location object (specifically, a NrnSecListLocation object) that points to the somatic sectionlist. This object will be used later to specify where mechanisms are to be added etc. The name argument can be chosen by the user, and should be unique across mechanisms. The prefix argument string should correspond to the SUFFIX field in the NEURON NMODL description file (Carnevale and Hines, 2006) of the channel. The locations argument specifies which sections the mechanism are to be added to.
Next we need to specify the parameters of the model. A parameter can be in two states: frozen and not-frozen. When a parameter is frozen it has an exact known value, otherwise it has well-defined bounds but the exact value is not known yet. The parameter for the specific capacitance of the soma will be a frozen value. Here param_name refers to the name of the parameter in the NEURON simulator namespace, whereas name is a user-specified alias used in BluePyOpt. The two parameters that represent the maximal conductance of the sodium and potassium channels are to be optimised, and are therefore specified as frozen=False, i.e., not-frozen, and bounds for each are provided with the bounds argument. To optimise the parameters of the cell, we further need to create a CellEvaluator object. This object needs to know which protocols to inject, which parameters to optimise, and how to compute a score, so we'll first create objects that define these aspects. A protocol consists of a set of stimuli and a set of responses (i.e., recordings). These responses will later be used to calculate the score of the specific model parameter values. In this example, we will specify two stimuli, two square current pulses delivered at the soma with different amplitudes. To this end, we first need to create a location object for the soma. This object is required in addition to the previously defined somatic_loc, because it points to the compartment in the middle of the soma, whereas the former refers to the entire section list defining the soma. The step_amplitude argument of the NrnSquarePulse specifies the amplitude of the current pulse, and step_delay, step_duration, and total_duration specify the start time, length and total simulation time. Finally, we create a combined protocol that encapsulates both current pulse protocols. twostep_protocol = ephys.protocols.SequenceProtocol ('twostep', protocols=sweep_protocols) Now to compute the model score that will be used by the optimisation algorithm, we define objective objects. For this example, our objective is to match the eFEL "Spikecount" feature to specified values for both current injection amplitudes. In this case, we will create one objective per feature (see Section 4.2.3 for more information about this objective). The eFEL allows for features requiring multiple traces, such as simultaneous somatic and dendritic voltage recordings. The recording_names argument therefore takes a dictionary where the keys are the recording locations. The empty string key denotes the main location, in this case the soma. We then pass these objective definitions to a ObjectivesCalculator object, calculate the total scores from a protocol response.
The CellEvaluator constructor has a field param_names which contains the (ordered) list of names of the parameters that are used as input (and will be fitted later on). Now that we have a cell template and an evaluator for this cell, the IBEADEAPOptimisation object can be created and run.
As an evolutionary algorithm, the IBEA algorithm evolves a population through consecutive generations. For each generation or iteration, a set of offspring individuals are generated from selected parents from the previous generation. When setting up the algorithm, we can specify the size of the offspring population and the maximum number of generations. After a short time (approximately 4 min on a single 2.9 GHz Intel Core i5 core), the optimisation returns the final population, the hall of fame (sorted by the sum of the objectives), a logbook, and an object containing the history of the population during the execution of the algorithm. Figure 4 shows the results in a graphical form. As expected, the solution to the optimisation problem is not unique. Several distinct individuals ( Figure 4A) have a perfect score of 0. We can visualize the region of the solution space explored by the algorithm using a triangular grid plot ( Figure 4B).

Neocortical Pyramidal Cell
Our second use case is a more complex example demonstrating the optimisation of a morphologically detailed model of a thick-tufted layer 5 pyramidal cell (L5PC) from the neocortex ( Figure 5A). This example uses a BluePyOpt port of the stateof-the-art methods for the optimisation of the L5PC model described in Markram et al. (2015). The original model is available online from the Neocortical Microcircuit Collaboration Portal . Due to its complexity, we will not describe the complete optimisation script here. The full code is available from the BluePyOpt website. What we will do here is highlight the particularities of this model compared to the introductory single compartmental model optimisation. As a first validation and point of reference, we ran the BluePyOpt model with its original parameter values from Ramaswamy et al. (2015), as shown in Figure 5B. For clarity, the code for setting the parameters, objective and optimisation algorithm is partitioned into separate modules. Configuration values are stored and read from JavaScript Object Notation JSON files.

Parameters
Evidently, the parameters of this model, as shown in Table 1, far exceed in number those of the single compartmental use case. The parameters marked as frozen are kept constant throughout the optimisation. The parameters to be optimised are the maximal conductances of the ion channels and two values related to the calcium dynamics. The location of the parameters is based on sectionlist names, whereby sections are automatically assigned to the somatic, axonal, apical and basal sectionlists by NEURON when it loads a morphology.
An important aspect of this neuron model is the non-uniform distribution of certain ion channel conductances. For example, the h-channel conductance is specified to increase exponentially with distance from the soma (Kole et al., 2006), as follows.

Protocols
During the optimisation, the model is evaluated using three square current step stimuli applied and recorded at the soma. For these protocols, a holding current is also applied during the entire stimulus, the amplitude of which is the same as was used in the in vitro experiments to keep the cell at a standardized membrane voltage before the step current injection. Another stimulus protocol checks for a backpropagating action potential (bAP) by stimulating the soma with a very short pulse, and measuring the height and width of the bAP at a location of 660 and 800 µm from the soma in the apical dendrite. It is specified as follows.

Objectives
For each of the four stimuli defined above, a set of eFeatures is calculated ( Table 2). These are then compared with the same features extracted from experimental data. As described in Markram et al. (2015), experiments were performed that applied these and other protocols to L5PCs in vitro. For these cells, the same eFeatures were extracted, and the mean µ exp and standard deviation σ exp calculated. The bAP target values are extracted from Larkum et al. (2001).
For every feature value f model , one objective score is calculated: following Druckmann et al. (2007). This approach normalizes all objective scores to a common scale, allowing them to be combined regardless of units, and weights specific objectives according to feature variability.

Optimisation
For the optimisation of this cell model we needed significantly more computing resources. The goal was to find a solution that has objective values that are within approximately 3 standard Locations dend1 and dend2 are respectively 660 and 800 µm from the soma in the apical trunk. Depending on the eFeature type the units can be mV or ms.
deviations from the experimental mean. For this we ran 100 generations with an offspring size of the genetic algorithm of 100 individuals ( Figure 6D). The evaluation of these 100 individuals was parallelized over 50 CPU cores (Intel Xeon 2.60 GHz) using SCOOP, and took about 4 h to run. A diverse set of acceptable solutions was found ( Figure 6C). Figure 6A shows the 10 best solutions in the hall of fame (based on the sum of the objectives). The best solution has objective values that are in the same range as the reference model (Figures 5C, 6B). The profile of the objective scores is not the same for these two models, showing that there are multiple solutions that match the experimental data to a similar degree. Figure 7 shows a comparison of the optimised model to its reference under Gaussian noise current injection (not used during the optimisation).

Spike-Timing Dependent Plasticity (STDP) Model
The BluePyOpt framework was designed to be versatile and broadly applicable to a wide range of neuroscientific optimisation problems. In this use case, we demonstrate this versatility by using BluePyOpt to optimise the parameters of a calcium-based STDP model (Graupner and Brunel, 2012) to summary statistics FIGURE 6 | Results of optimising L5PC model using BluePyOpt. Similar to Figure 5B, with the top ten objective values found by BluePyOpt, and the best one plotted darker (B) Objective scores for the best objective values found by BluePyOpt. The goal of the algorithm is to minimize these values. (C) Parameter diversity in the solution space. Parameter values shown for reference model (red crosses), best (blue crosses), 10 best individuals (black dots) and all individuals with all objectives below 5 (gray dots). Horizontal lines (black) represent the bounds of the parameters, when lower line is missing the bound is 0. (D) Evolution of the L5PC optimisation that found the model in (A). Plot shows the minimal, maximal and average scores found in the consecutive generations of the evolutionary algorithm.
from in vitro experiments (Nevian and Sakmann, 2006). That is, we show how to fit the model to literature data, commonly reported just as mean and SEM of the amount of potentiation (depression) induced by one or more stimulation protocols.
In the set of experiments performed by Nevian and Sakmann (2006), a presynaptic action potential (AP) is paired with a burst of three post-synaptic APs to induce either longterm potentiation (LTP) or long-term depression (LTD) of the postsynaptic neuron response. The time difference t between the presynaptic AP and the postsynaptic burst determines the direction of change: A burst shortly preceding the presynaptic AP causes LTD, with a peak at t = −50 ms; conversely, a burst shortly after the presynaptic AP results in LTP, with a peak at t = +10 ms (Nevian and Sakmann, 2006). The model proposed by Graupner and Brunel (2012) assumes bistable synapses, with plasticity of their absolute efficacies governed by post-synaptic calcium dynamics. That is, each synapse is either in an high-conductance state or a lowconductance state; potentiation and depression translate then into driving a certain fraction of synapses from the lowconductance state to the high-conductance state and vice versa; synapses switch from one state to another depending on the time spent by post-synaptic calcium transients above a potentiation (depression) threshold. Following Graupner and Brunel (2012), the model is described as Frontiers in Neuroinformatics | www.frontiersin.org where ρ is the absolute synaptic efficacy, ρ ⋆ delimits the basins of attraction of the potentiated and depressed state, γ p (γ d ) is the potentiation (depression) rate, is the Heaviside function, θ p (θ d ) is the potentiation (depression) threshold, Noise(t) is an activity dependent noise. The postsynaptic calcium concentration is described by the process c, with time constant τ Ca . C pre is the calcium transient caused by a presynaptic spike occurring at time t i , with a delay D to account for the slow activation of NMDARs, while C post is the calcium transient caused by a postsynaptic spike occurring at time t j .
For periodic stimulation protocols, such as in Nevian and Sakmann (2006), the synaptic transition probability can be easily calculated analytically (Graupner and Brunel, 2012), allowing estimation of the amount of potentiation (depression) induced by the stimulation protocol without actually running any neuron simulations. The amount of potentiation (depression) obtained with different protocols in vitro become the objectives of the optimisation.
A small Python module stdputil calculating this model is available in the example section on the BluePyOpt website. To optimise this model, only an Evaluator class has to be defined that implements an evaluation function:  IBEADEAPOptimisation(GraupnerBrunelEvaluator()) results = opt.run(max_ngen=200) Figure 8 shows the results of the optimisation. As in the other use cases, a large set of acceptable solutions are found by the algorithm. Comparison between model and experimental results; the models match the available in vitro data and predict the outcome of the missing points. In light blue, models generated by individuals having fitness values within one standard error of the mean from experimental in vitro data. In dark blue, best model, defined as the closest to all experimental data points. Experimental data from Nevian and Sakmann (2006) digitized using Rohatgi (2015). (B) Calcium transients generated by the best model (A) for each stimulation protocol. Potentiation and depression thresholds, θ p and θ d respectively, are indicated by the dashed lines. (C) Evolution of the STDP optimisation that found the model in (A). Minimal and average scores found in the consecutive generations of the evolutionary algorithm.

DISCUSSION
BluePyOpt was designed to be a state-of-the-art tool for neuroscientific model parameter search problems that is both easy to use for inexperienced users, and versatile and broadly applicable for power users. Three example use cases were worked through in the text to demonstrate how BluePyOpt serves each of these user communities.
From a software point of view, this dual goal was achieved by an object oriented architecture which abstracts away the domainspecific complexities of search algorithms and simulators, while allowing extension and modification of the implementation and settings of an optimisation. Python was an ideal implementation language for such an architecture, with its very open and minimal approach to extending existing implementations. Object oriented programming allows users to define new subclasses of existing BluePyOpt API classes with different implementations. The duck typing of Python allows parameters and objectives to have any kind of type, e.g., they don't have to be floating point numbers. In extreme cases, function implementations can even be overwritten at run time by monkey patching. These features of Python give extreme flexibility to the user, which will make BluePyOpt applicable to many use cases.
A common issue arising for users of optimisation software is the configuration of computing infrastructure. The fact that BluePyOpt is coded in Python, an interpreted language, and provides Ansible scripts for its installation, makes it straightforward to run on diverse computing platforms. This will give the user the flexibility to pick the computing infrastructure which best fits their needs, be it their desktop computer, university cluster or temporarily rented cloud infrastructure, such as offered by Amazon Web Services.
This present paper focuses on the use of BluePyOpt as an optimisation tool. It is worth noting that the application domain of BluePyOpt needn't remain limited to this. The ephys model abstraction can also be used in validation, assessing generalization, and parameter sensitivity analyses. E.g., when applying a map function to an ephys model evaluation function which takes as input a set of morphologies, one can measure how well the model generalizes when applied to different morphologies. The present paper expressly does not touch on issues of generalization power, overfitting, or uniqueness of solution. It is worth now making a few points on the latter. While BluePyOpt could successfully optimise the three examples, Figures 4B, 6C show a diversity of solutions giving good fitness values. That is, for these neuron model optimisation problems, the solutions found are non-unique. This is compatible with the observation that Nature itself also utilizes various and nonunique solutions to provide the required phenotype (Schulz et al., 2007;Taylor et al., 2009). For other problems solutions could be unique, making BluePyOpt useful e.g., for extracting parameters for models of synapse dynamics (Fuhrmann et al., 2002).
While BluePyOpt significantly reduces the domain specific knowledge required to employ parameter optimisation strategies, some thought from the user in setting up their problem is still required. For example, BluePyOpt does in principle allow brute force optimisation of all parameters of the L5PC model example, including channel kinetics parameters and passive properties, but such an approach would almost certainly be unsuccessful. Moreover, when it comes to assessing fitness of models, care and experience is also required to avoid the optimisation getting caught in local minima, or cannibalizing one objective for another. For neuron models for example, featurebased approaches coupled with multi-objective optimisation strategies have proven especially effective (Druckmann et al., 2007). Indeed, even the stimuli and features themselves can be optimised on theoretical grounds to improve parameter optimisation outcomes (Druckmann et al., 2011). For these reasons, an important companion of BluePyOpt will be a growing library of working optimisation examples developed by domain experts for a variety of common use cases, to help inexperienced users quickly adopt a working strategy most closely related to their specific needs.
As this examples library grows, so too will the capabilities of BluePyOpt evolve. Some improvements planned for the future include the following: Support for multi-stage optimisations allowing for example the passive properties of a neuron to be optimised in a first stage, prior to optimising the full-active dendritic parameters in a second phase Embedded optimisation allowing for example an optimisation of a "current at rheobase" feature requiring threshold detection during the optimisation using e.g., a binary search. Also, for integrate-and-fire models such as the adapting exponential integrate-and-fire (Brette and Gerstner, 2005), a hybrid of a global stochastic search and local gradient descent has been shown to be a competitive approach (Jolivet et al., 2008) Fast pre-evaluation of models to exclude clearly bad parameters before computation time is wasted on them Support for evaluation time-outs to protect against optimisations getting stuck in long evaluations, for example when using NEURON's CVODE solver, which can occasionally get stuck at excessively high resolutions. Support for explicit units to make optimisation scripts more readable, and sharing with others less error prone.
Although parameter optimisations can require appreciable computing resources, the ability to share the code of an optimisation through a light-weight script or ipython notebook using BluePyOpt will improve reproducibility in the field. It allows for neuroscientists to exchange code and knowledge about search algorithms that perform well for particular models. In the future, making it possible for users to read and write model descriptions from community standards (Raikov et al., 2011;Cannon et al., 2014), could further ease the process of plugging in a model into a BluePyOpt optimisation. By providing the neuroscientific community with BluePyOpt, an open source tool to optimise model parameters in Python which is powerful, easy to use and broadly applicable, we hope to catalyse community uptake of state-of-the-art model optimisation approaches, and encourage code sharing and collaboration.

DOWNLOADS
The source code of BluePyOpt, example scripts, cloud installation scripts, documentation and a list of the software dependencies are available on Github at https://github.com/BlueBrain/BluePyOpt, the former under the GNU Lesser General Public License version 3 (LGPLv3), and the latter two under a BSD license.