Artificial Intelligence and Synthesis in Ecology and Evolution

The grand ambition of theorists studying ecology and evolution is to discover the logical and mathematical rules driving the world’s biodiversity at every level from genetic diversity within species to differences between populations, communities, and ecosystems. This ambition has been difficult to realize in great part because of the complexity of biodiversity. Theoretical work has led to a complex network of theories, each often having non-obvious consequences for other theories. Case in point, the recent realization that genetic diversity involves a great deal of temporal and spatial stochasticity forces theoretical population genetics to consider abiotic and biotic factors generally reserved to ecosystem ecology. This interconnectedness may require theoretical scientists to adopt new techniques adapted to reason about large sets of theories. Mathematicians have solved this problem by using formal languages based on logic to manage theorems. However, theories ecology and evolution are not mathematical theorems, they involve uncertainty. Recent work in Artificial Intelligence in bridging logic and probability theory offers the opportunity to build rich knowledge bases that combine logic’s ability to represent rich mathematics ideas with probability theory’s ability to model uncertainty. We describe these hybrid languages and explore how they could be used to build unified knowledge based of knowledge for ecology and evolution.

of knowledge and offers a concrete answer to the issue of knowledge synthesis. Mizar uses a language powerful enough for the formalization of evolutionary theories envisioned by Lewis, to formalize the result of Queller on Price's theorem and its relationship to other theories, and to build a knowledge base out of Rice's axiomatic theory of evolution. As Lewis wrote, doing so would force us to think more clearly about the theoretical structure of evolution, with theoretical ecology facing a similar state of disorganization. Case in point: theoretical community ecologists have been criticized for focusing on a single prediction for theories capable of making several [46]. An example of this is Hubbell's neutral theory of biodiversity [33], which uses an unrealistic point-mutation model that does not fit with our knowledge of speciation and lead to odd predictions [21,17,18]. In logic-based (also called symbolic) systems like Mizar, all formulas involving speciation would be implicitly linked together. Storing ecological theories in a knowledge base based on a rich logic would then automatically prevent inconsistencies and highlights the consequences of the mathematical theories on all its components.
However important the goal of formalization is, it remains somewhat divorced from an essential aspect of theories in ecology and evolution: their probabilistic and fuzzy nature. As a few examples: a surprisingly common idea found in ecological theories is that predators are generally larger than their preys, a key assumption of the food web model of Williams and Martinez [73]; deviations from the Hardy-Weinberg principle are not only common but tend to give important information on selective pressures; and nobody expects the Rosenzweig-MacArthur predator-prey model to be exactly right. In short, important ideas in ecology and evolution do not fit the true/false epistemological framework of systems like Mizar and ideas do not need to be derived from axiomatic principles to be useful. We are often less concerned by whether a formula can be derived from axioms than in how it fits a particular dataset. In the 1980s, Artificial Intelligence experts developed probabilistic graphical models to handle large probabilistic systems [54]. While probabilistic graphical models are capable of answering probabilistic queries for large systems of variables, they cannot represent or reason with sophisticated mathematical formulas. Alone, neither logic nor probability theory is enough to elucidate the structure of theories in ecology and evolution.
For decades, researchers have tried to unify probability theory with rich logics to build knowledge bases both capable of the sophisticated mathematical reasoning found in automated theorem provers and probabilistic reasoning of graphical models. Recent advances have moved us closer to that goal [61,25,71,51,32,67,4]. Using these systems, it is possible to check if a mathematical formula can be derived from existing results and also possible to ask probabilistic queries about theories and data. The probabilistic nature of these representations also make them a good fit to learn complex logical and mathematical formulas from data [38]. Within this framework, there is no longer a sharp distinction between theory and data, since the knowledge base defines a probability distribution over all objects, including logical relationships and mathematical formulas.
For this contribution, we introduce key ideas on methods at the frontier of logic and probability, beginning with a short survey of knowledge representations based on logic and probability. First-order logic is described, along with how it can be used in a probabilistic setting with Markov logic networks [61]. We detail how theories in ecology and evolution can be represented with Markov logic networks, as well as highlighting some limitations. Synthesis in ecology and evolution has been made difficult by the sheer number of theories involved and their complex relationships [56]. Practical representations to unify logic and probability are relatively new but we argue they could be used to achieve greater synthesis by allowing the construction of large flexible knowledge bases with a mix of mathematical and scientific knowledge.
The formula is implicitly connected to other formulas involving the same symbol, such that if we were to establish a different but equivalent way to represent the speed of light c, it could automatically substitute c in e = mc 2 .
Artificial Intelligence researchers have long being interested in expert systems capable of scientific discoveries, or simply capable of storing scientific and medical knowledge in a single coherent system. Dendral, arguably the first expert system, could form hypotheses to help identify new molecules using its knowledge of chemistry [43]. In the 1980s, Mycin was used to diagnose blood infections (and did so more accurately than professionals) [13]. Both systems were based on logic, with Mycin adding a "confidence factor" to its rules to model uncertainty. These expert systems generally relied on a simple logic system not powerful enough to handle uncertainty. With few exceptions, the rules were often hand-crafted by human experts and were unable to discover new rules. After the experts established the logic formulas, the systems acted as static knowledge bases. Algorithms have been developed to learn new logic rules from data [49,50], but the non-probabilistic nature of the resulting knowledge base make it difficult to handle real-world uncertainty. In addition to expert systems, logic systems are used to store mathematical knowledge and perform automatic theorem proving [29]. Pure logic has rarely been used in ecology and evolution, but recent studies have shown its ability to reconstruct food webs from data [10,69].
There are many different logics for expert systems and automatic theorem proving [29,58,53]. We will focus on first-order logic, the most commonly used logic in efforts to unify logic with probability. A major reason for adopting rich logics, whether first-order or higher-order, is to allow for the complex relationships found in ecology and evolution to be expressed in concise formulas. Stuart Russell noted that "the rules of chess occupy 10 0 pages in first-order logic, 10 5 pages in propositional logic, and 10 38 pages in the language of finite automata" [62]. Similarly, first-order logic will allow us to directly express complex ecological ideas in a simple but formal language.
In mathematics, a function f maps terms X (its domain) to other terms Y (its codomain) f : X → Y. The number of arguments of a function, |X|, is called its arity. The atomic element of first-order logic is the predicate: a function that maps 0 or more terms to a truth value: false or true. In first-order logic, terms are either variables ranging over a domain such as x or city, constants such as 42, M anila, π, or functions mapping terms to other terms such as multiplication, integration, sin, CapitalOf . Variables have to be quantified either universally with ∀ (forall), existentially with ∃ (exists), or uniquely with ∃!. ∀x : p(x) means p(x) must hold true for all possible values of x. ∃x : p(x) means it must hold for at least one value of x while ∃! means it must hold for exactly one value of x. Using this formal notation, we could write the relationship between the basal metabolic rate (BMR) and body mass (M ass) for mammals [2]: This formula has one variable m which is universally quantified (∀m ∈ M ammal reads "for all m in the set M ammal"). It has two constants: the numbers 4.1 and 0.75, along with four functions (BM R, M ass, multiplication, exponentiation). = is the sole predicate.
A first-order logic formula is either a lone predicate or a complex formula formed by linking formulas using the unary connective ¬ (negation) or binary connectives (and ∧, or ∨, implication ⇒, see table 1). For example, P reyOn(s x , s y ) is a predicate that maps two species to a truth value, in this case whether the first species preys on the second species, and IsP arasite(s) is a predicate that is true if species s is a parasite. We could also have a function M ass(s x ) mapping a species to its weight. We can build more complex formulas from there, for example: The first formula says that species don't prey on themselves. The second says that predators are larger than their preys (> is a shorthand for the greater than predicate). The third formula refines the second one by adding that predators are larger than their preys unless the predator is a parasite. None of these rules are expected to the true all the time, which is where mixing probability with logic will come handy. The Rosenzweig-MacArthur equation can also easily be expressed with first-order logic: This formula has four functions: the time differentialẋ ≡ dx/dt, multiplication, addition, and subtraction. preys n and predators y are universally quantified variables while r 0 , K, C, D, X, δ 0 are constants. The formula has only one predicate, =, and both sides of the formula are connected by ∧, the symbol for conjunction ("and").
A knowledge base K in first-order logic is a set of formulas K = {f 0 , f 1 , ..., f |K|−1 }. First-order logic is expressive enough to represent and manipulate complex logic and mathematical ideas. It can be used for simple ideas such that predators are generally larger than their preys (eq. 2b), mathematical formulas for predator-prey systems equation (eq. 3), and also to establish the logical relationship between various predicates. We may want a P reyOn predicate to tell us whether s x preys on s y , but also a narrower P reyOnAt(s x , s y , l) predicate to model whether s x preys on s y at a specific location l. In this case, it would be a good idea to have the formula ∀s x , s y , l : P reyOnAt(s x , s y , l) ⇒ P reyOn(s x , s y ). Given this formula and the data point P reyOnAt(W olverine, Rabbit, Quebec), we do not need P reyOn(W olverine, Rabbit) to be explicitly stated, ensuring the larger metaweb [57] is always consistent with information from local food webs.
An interpretation defines which object, predicate, function is represented by which symbol, e.g., it says P reyOnAt is a predicate with three arguments, two species and one location. The process of replacing variables with constants is called grounding, and we talk of ground terms / predicates / formulas when no variables are present. Together with an interpretation, a possible world assigns truth values to each possible ground predicates, which can then be used to assign truth values to a knowledge base's formulas. P reyOn(s x , s y ) can be neither true nor false until we assign constants to the variables s x and s y . Constants are typed, so a set of constants C may include two species {Gulo gulo, Orcinus orca} and three locations {Quebec, F ukuoka, Arrakis}. The constants C yield 2 2 ×3 possible ground predicates for P reyOnAt(s x , s y , l): P reyOnAt(Gulo gulo, Gulo gulo, Quebec) P reyOnAt(Gulo gulo, Orcinus orca, Quebec) P reyOnAt(Orcinus orca, Orcinus orca, Quebec) P reyOnAt(Orcinus orca, Gulo gulo, Quebec) P reyOnAt(Gulo gulo, Gulo gulo, F ukuoka) . . . and only two possible ground predicates for IsP arasite: IsP arasite(Gulo gulo)

IsP arasite(Orcinus orca)
We say a possible world satisfies a knowledge base (or a single formula) if all the formulas are true given the ground predicates. A basic question in first-order logic is to determine whether a knowledge base K entails a formula f , or K |= f . Formally, the entailment K |= f means that for all possible worlds in which all formulas in K are true, f is also true. More intuitively, it can be read as the formula following from the knowledge base [63].
Probabilistic graphical models, which combine graph theory with probability theory to represent complex probability distributions, can be an alternative to logic-based representations [39,6]. There are primarily two motivations behind probabilistic graphical models. First, even for binary random variables, we need to learn 2 n − 1 parameters for a distribution of n variables. This is unmanageable on many levels: it is computationally difficult to do inference with so many parameters, requires a large 4 . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/161125 doi: bioRxiv preprint   Truth table  Name Common Table 1: Common binary connectives. The table shows the resulting truth value (T: True, F: False) for all possible combinations. iff is read if and only if. Implication is one of the most common connective and may have surprising behavior. In particular, it will always return true when the left-side is false. While this may seem odd, it allows us to make statements such as ∀x ∈ R : This formula holds for all real numbers, including negative ones, since with x = −1, x ≥ 0 is false and F ⇒ F returns true. amount of memory, and makes it difficult to learn parameters without an unreasonable volume of data [39]. Second, probabilistic graphical models provide important information about independences and the overall structure of the distribution. Probabilistic graphical models were also used as expert systems: Munin had a network of more than 1000 nodes to analyze electromyographic data [20], while PathFinder assisted medical professional for the diagnostic of lymph-node pathologies [30] (Figure 1).
The two key inference problems in probabilistic machine learning are finding the most probable joint state of the unobserved variables (maximum a posteriori, or MAP) and computing conditional probabilities (conditional inference). In a simple presence/absence model for 10 species (s 0 , s 1 , ..., s 9 ), given that we know the state of species s 0 = P resent, s 1 = Absent, s 2 = Absent, MAP inference would tell us the most likely state for species s 3 , ..., s 9 , while conditional inference could answer queries such as P (s 4 = Absent|s 0 = P resent).

Markov logic
At this point we have first-order logic, which is capable of manipulating complex logic and mathematical formulas but cannot handle uncertainty, and probabilistic graphical models, which cannot be used to represent mathematical formulas (and thus theories in ecology and evolution) but can handle uncertainty. The limit of first-order logic can be illustrated with our previous example: predators generally have a larger body weight (M ass) than their preys, which we expressed in predicate logic as ∀s x , s y : P reyOn(s x , s y ) ⇒ M ass(s x ) > M ass(s y ), but this is obviously false for some assignments such as s x : grey wolf and s y : moose. However, it is still useful knowledge that underpins many ecological theories [73]. When our domain involves a great number of variables, we should expect useful rules and formulas that are not always true.
A core idea behind many efforts to unify rich logics with probability theory is that formulas can be weighted, with higher values meaning we have greater certainty in the formula. In pure logic, it is impossible to violate a single formula. With weighted formulas, an assignment of concrete values to variables is only less likely if it violates formulas, and how much less likely will depend on the weight assigned to the violated formula. The higher the weight of the formula violated, the less likely the assignment is. It is conjectured that all perfect numbers are even (∀x : P erf ect(x) ⇒ Even(x)), if we were to find a single odd perfect number, that formula would be refuted. It makes sense for mathematics but for many disciplines, such as biology, important principles are only expected to be true most of the times. If we were to find a single predator smaller than its prey, it would definitely not make our rule useless The idea of weighted formulas is not new. Markov logic networks (or just Markov logic), invented a decade ago, allows for logic formulas to be weighted [61,19]. Similar efforts also use weighted formulas [4,32]. Markov logic supports algorithms to add weights to existing formulas given a data-set, learn new formulas or revise existing ones, and answer probabilistic queries (MAP or conditional). As a case where P a(x i ) is the set of parents of variable x i . Because no cycles are allowed, the variables form an ordering so the set P a(x i ) can only involve variables already seen on the left of x i . Thus, P (a)P (b|a)p(c) is a valid Bayesian networks but not P (a)P (b|c)P (c|b). The four vertices represented here were extracted from PathFinder, a Bayesian network with more than 1000 vertices used to help diagnose blood infections [30]. The vertices represent four variables related to blood cells and are denoted by a single character (in bold in the figure): C, M, L, G. We denote a positive value with a lowercase letter and a negative value with ¬ (e.g.: C = c, M = ¬m). Since P (¬x|y) = 1 − P (x|y), we need only 2 |P a(x)| parameters per vertex, with |P a(x)| being the number of parents of vertex x. The structure of Bayesian networks highlights the conditional independence assumptions of the distribution and reduces the number of parameters for learning and inference. As a example query: P (l, ¬c, m, ¬g) = P (l)P (¬c)P (m|¬c)P (¬g|l, ¬c, m) = 0.81 × (1 − 0.65) × 0.27 × (1 − 0.42) = 0.044. See [16] for a detailed treatment of Bayesian networks and [39] for a more general reference on probabilistic graphical models. study, Yoshikawa et al. used Markov logic to understand how events in a document were time-related [75]. Their research is a good case study of interaction between traditional theory-making and artificial intelligence. The formulas they used as a starting point were well-established logic rules to understand temporal expressions. From there, they used Markov logic to weight the rules, adding enough flexibility to their system to beat the best approach of the time. Markov logic makes it simple to grow knowledge, two research labs with different knowledge bases can simply put all their formulas in a single knowledge base, they only need to reevaluate the weights assigned to the formulas. Brouard et al. [12] used Markov logic to understand gene regulatory network, noting how the resulting model provided clear insights, in contrast to more traditional machine learning techniques.
In a nutshell, a knowledge base in Markov logic M is a set of weighted formulas .
Given constants C = {c 0 , c 1 , . . . , c |C|−1 }, it defines a Markov network (an undirected probabilistic graphical model) which is used to answer probabilistic queries. We will also use an extension of Markov logic called Hybrid Markov logic. Weights are real numbers in the −∞, ∞ range. The intuition is: the higher the weight associated with a formula, the greater the penalty for violating it (or alternatively: the less likely a possible world is). The cost of an assignment is the sum of the weights of the unsatisfied formulas (those that are false). The higher the cost, the less likely the assignment is. Thus, if a variable assignment violates 12 times a formula with a weight of 0.1 and once a formula with a weight of 1.1, while another variable assignment violates a single formula with a weight of 5, the first assignment will have a higher likelihood (cost of 2.3 vs 5). Formulas with an infinite weight acts like formulas in pure logic: they cannot be violated without setting the probabilities to 0. In short, a knowledge base in pure first-order logic is exactly the same as a knowledge base in Markov logic where all the weights are infinite. In practice, it means mathematical ideas and axioms can easily be added to Markov logic as formulas with an infinite weight. Formulas with weights close to 0 have little effect on the probabilities, in short the cost of violating them is small. A formula with a negative weight is expected to be false. It is often assumed that all weights are positive real numbers without loss of generality since (f, −w) ≡ (¬f, w).
Markov logic can answer queries of complex formulas of the form: where f 0 and f 1 are first-order logic formulas while M is a weighted knowledge base and C a set of constants. It's important to note that neither f 0 nor f 1 need to be in M, they can be arbitrary formulas in first-order logic. Logical entailment M |= f is equivalent to finding P (f |M) = 1 [19].
We will build a small knowledge base for an established ecological theory: the niche model of trophic interactions [73]. The first iteration of the niche model posits that all species are described by a niche position N (their body size for instance) in the [0, 1] interval, a diet D in the [0, N ] interval, and a range R such that a species preys on all species with a niche in the [D − R/2, D + R/2] interval. We can represent these ideas with three formulas: where ∀ reads for all and ⇔ is logical equivalence (see 1 ). As pure logic, this knowledge base makes little sense. Formula 8a is obviously not true all the time. It is mostly true, since most pairs of species do not interact. In Markov logic, it is common to have a formula for each lone predicate, painting a rough picture of its marginal probability [19,35]. We could also add that cannibalism is rare ∀x : ¬P reyOn(x, x) and that predator-prey are generally asymmetrical ∀x, y : P reyOn(x, y) ⇒ ¬P reyOn(y, x) (although this formula is redundant while the idea that predators are generally larger than their preys). Formulas that are often wrong are assigned a lower weight but can still provide useful information about the system. The second formula says that the diet is smaller than the niche value. The last formula is the niche model: species x preys on y if and only if species y's niche is within the diet interval of x. See Jain [35] for a detailed treatment of knowledge engineering with Markov logic. Using Markov logic and a data-set, we could learn a weight for each formula in the knowledge base. This step alone is useful and provides insights into which formulas hold best in the data. With the resulting weighted knowledge base, we can make probabilistic queries and even attempt to revise the theory automatically. We could find, for example, that the second rule does not apply to parasites or some group and get a revised rule such as ∀x : ¬IsP arasite(x) ⇒ D(x) < N (x). See Box I for an example of Markov logic network applied to an ecological data-set.

Fuzziness
First-order logic provides a formal language for expressing mathematical and logical ideas, while probability theory provides a framework for reasoning about uncertainty. A third dimension often found in discussions on unifying logic with probability is fuzziness. A struggle with applying Markov logic to ecology is that all predicates are either true or false, that is, Markov logic defines a distribution over binary predicates. Going back to Rosenzweig-MacArthur (eq. 3), this formula's weight in Markov logic is almost certainly going to be zero, since it's never exactly right. If the Rosenzweig-MacArthur equation predicts a population size of 94 and we observe 93, the formula is false. Weighted formulas help us understand how often a formula is true, but in the end the formula has to give a binary truth value: true or false, there is no place for nuances. Many frameworks solve this by allowing all predicates to return truth values in the [0, 1] range, with 0 being completely false, 1 being completely true, and anything in-between denoting nuances between those extremes [76,7]. It is used in both probabilistic soft logic [37,4] and deep learning approaches to predicate logic [77,32]. This approach adds flexibility but in practice prevents conditional queries since it would require a probability distribution over all truth values. Hybrid Markov logic [71,19] extends Markov logic by allowing not only weighted formulas but numeric terms, along with soft equality, which applies Gaussian penalty to deviations from equality. Soft equality is a good fit for formulas like the Rosenzweig-MacArthur system. Hybrid Markov logic is not as well-developed as standard Markov logic, for example there are no algorithms to learn new formulas from data, but it solves much of the problem that fuzzy approaches solve while retaining the ability to answer conditional queries. Several languages for reasoning have combined fuzziness with probability or logic ( Figure 2). It has been argued that, in the context of Bayesian reasoning, fuzziness plays an important role in bridging logic with probability [52,34]. However, how to effectively combine rich logics with probability theory remains an open question, and so is the role of fuzziness.

Bayesian higher-order probabilistic programming
Probabilistic programming languages are programming languages built to easily describe probabilistic models and simplify the inference process. Stan [14] and BUGS [44] are two popular examples of probabilistic programming languages used for Bayesian inference. More flexible languages for Bayesian probabilistic programming have recently emerged. These languages, like Church [27] and Anglican [74], accept higher-order constructs (that is: functions accepting other functions as arguments). The ambition is that "ultimately we would like simply to be able to do probabilistic programming using any existing programming language as the modeling language" [70]. Bayesian higher-order probabilistic programming (BHOPLL) languages may hold the key to sound inference mixed with an even richer logic than first-order logic. Indeed, most modern systems to formalized mathematics are based on type theory (higher-order logic) [53]. A formal description of type theory is beyond the scope of this text (see [22,53]). Crucially, programming languages also heavily rely on type theory [55]. Coq, one of the most popular language for automated theorem proving, is just a programming language with a strict type system and algorithms to support theorem proving [45].
This leads to a dilemma. Software-wise, BHOPLLs are well ahead of the approaches described in previous sections such as hybrid Markov logic networks. Current higher-order probabilistic program-   Probabilistic graphical models combine probability theory with graph theory to represent complex distributions [39]. Alternatives to probability theory for reasoning about uncertainty include possibility theory and Dempster-Shafer belief functions, see [28] for an extended discussion. In the green rectangle: Fuzzy logic extends standard logic by allowing truth values to be anywhere in the [0, 1] interval. Fuzziness models vagueness and is particularly popular in linguistics, engineering, and bioinformatics, where complex concepts and measures tend to be vague by nature. See [41] for a detailed comparison of probability and fuzziness. In the purple rectangle: languages capable of modelling mathematical formulas. It is important to note that while first-order logic is expressive enough to express a large class of mathematical ideas, many languages rely on a restricted from of first-order logic without functions. Alone, these languages are not powerful enough to express scientific ideas, we must thus focus on what lies at their intersection. Type-2 Fuzzy Logic is a fast-expanding [64,47] extension to fuzzy logic, which, in a nutshell, models uncertainty by considering the truth value itself to be fuzzy [48,78]. Markov logic networks [61,19] extends predicate logic with weights to unify probability theory with logic. Probabilistic soft logic [37,3] also has formulas with weights, but allows the predicates to be fuzzy, i.e. have truth values in the [0, 1] interval. Some recent deep learning studies also combine all three aspects [23,32].

9
. CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/161125 doi: bioRxiv preprint

Functions
Meaning P P reyOn : species × species → [0, 1] Probability that a species preys on another P reyOn : species × species → bool Predator-prey relationship P reyOnAt : species × species × location → bool Predator-prey relationship at a given location P resenceAt : species × location → bool Presence of a species at a location IsP arasite : species → bool Whether the species is a parasite IsGaller : species → bool Whether the species is a galler IsSalix : species → bool Whether the species is a salix CloselyRelated : species × species → bool Whether two species are closely related Occ : species → {location} Set of locations where a species is found Cooccurrence : species × species → R + Proportion of locations where the species co-occur HighCooccurrence : species × species → bool Pair of species with high co-occurence  ming languages operate on variants of well-known languages: Anglican is based on Clojure [74], Pyro is based on Python [9], Turing.jl uses Julia [24]. Many BHOPPLs have been designed to exploit the highperformance architecture developed for deep learning systems. Using GPUs (graphics cards) has been important to the development of fast learning and inference in deep learning [26].
There are no opensource implementations of Markov logic networks running on GPUs. In contrast, Pyro [9] is a BHOPLL built on top of PyTorch, one of the most popular framework for deep learning, allowing computation to be distributed on systems of GPUs. On the other hand, while in theory BHOPPLs may support the richer logics used in formalizing modern mathematics, in practice higher-order probability theory is itself not well understood. This is an active research topic [70] but formalization faces serious issues. For one, there are incompatibilities with the standard measure-theoretic foundation of probability theory, which may require rethinking how probability theory is formulated [11,67,66,31,65]. First-order logic is among the most studied formal languages, making it easy to use a first-order knowledge base with various software. The current informal nature of BHOPPLs make them hard to recommend for the synthesis of knowledge in ecology and evolution, even though they may very well hold the the most potential.

Box I: Markov logic and the Salix tritrophic system
The primary goal of unifying logic and probability is to be able to grow knowledge bases of formulas in a clear, precise language. For Markov logic, it means a set of formulas in first-order logic. For this example, we use Markov logic to build a knowledge base for ecological interactions around the Salix data-set [40]. The Salix data-set has 126 parasites, 96 species of gallers (insects), and 52 species of salix, forming a tritrophic ecological network (P arasite → Galler → Salix). Furthermore, we have partial phylogenetic  information for the species, their presence/absence in 374 locations, interactions, and some environmental information on the locations. To fully illustrate the strengths and limits of Markov logic in this setting, we will not limit ourselves to the data available for this particular data-set (e.g. we do not have body mass for all species). Data in first-order logic can be organized as a set of tables (one for each predicate). For our example, we have a table named P reyOnAt with three columns (its arguments) and a table named IsP arasitoid with only one column. This format implies the closed-world assumption: if an entry is not found, it is false (see table 3 for an example). For this problem we defined several functions and predicates to describe everything from predator-prey relationships, whether pairs of species often co-occurred, along with information on locations such as humidity, precipitation, and temperature (see table 2). We ran the basic learning algorithm from Alchemy-2 [61], which is used both to learn new formulas and weight them. The weights are listed at the end of each formulas. We use the ? character at the end of the formula involving data that was unavailable for this data-set (and thus, we could not learn the weight). As a sample, the algorithm returned these three formulas along with their weight: The first two formulas correctly define the tritrophic relationship between parasites, galler and salix, while the third shows a solid, but not as strong relationship between predation and co-occurence. Formula 9d would require hybrid Markov logic and a fuzzy predicate ≈.
Integration of macroecology and food web ecology may rely on a better understanding of macroecological rules [5]. These rules are easy to express with first-order logic, equation 9e is a formulation of Bergmann's rule. We also used the learning algorithm to test whether closely related species had similar preys, but the weight attributed to the formula was almost zero, telling us the formula was right as often as it was wrong: ∀s 0 , s 1 : CloselyRelated(s 0 , s 1 ) ∧ P reyOn(s 0 , s 2 ) ⇒ P reyOn(s 1 , s 2 ), 0.00.
This example shows both the promise and the current issues with hybrid logic-probabilistic techniques. Many of the predicates would benefit from being fuzzy, for example, P reyOn should take different values depending on how often predation occurs, and we had to use arbitrary cut-offs for predicates like CloselyRelated and HighT emperature. Fortunately, many of the recent approaches integrate logic with both fuzziness and probability theory [1,32,4]. Weights are useful to understand which relationship is strong in the data and this example show the beginning of a knowledge base for food web ecology. The next step would be to discover new formulas, whether manually or using machine learning algorithms, and add data to revise the weights. If a formula involves a predicate operating on food webs and we want to apply our knowledge base to a data-set without food webs, this formula will simply be ignored (because it won't have grounded predicates to evaluate it, see section 1). This is a strong advantage of this knowledge representation: our little knowledge base here can be used as a basis for any other ecological data-sets even if they share little. With time, it's possible to grow an increasingly connected knowledge base, linking various ideas from different fields together.
5 Where's our unreasonably effective paradigm?
Legitimate abstractions can often obfuscate how much various subfields are related. Natural selection is a good example. Many formulas in population genetics rely on fitness. Nobody disputes the usefulness of this abstraction, it allows us to think about changes in populations without worrying whether selection is caused by predation or climate change. On the other hand, fitness has also allowed the development of theoretical population genetics to evolve almost independently of ecology. There is a realization that much of the complexity of evolution is related to how selection vary in time and space, which puts evolution in ecology's backyard [8]. Achieving Lewis' goal of formalization would not prevent the use of fitness, but having formulas with fitness cohabiting with formulas explaining the components of fitness would implicitly link ecology and evolution. This goes in both directions: what are the consequences of new discoveries on speciation and adaptive radiations on the formation of metacommunities? How can community dynamics explain the extinction and persistence of new species? If there isn't a single theory of biodiversity, the imperative is to understand biodivetsity as a system of theories. Given the scope of ecology/evolution and the vast number of theories involved, it seems difficult to achieve a holistic understanding without some sort of formal system to see how the pieces of the puzzle fit together.
Connolly et al. noted how theories for metacommunities were divided between those derived from first principles and those based on statistical methods [15]. In systems unifying rich logics with a probabilistic representation, this distinction does not exists, theories are fully realized as symbolic and statistical entities. Efforts to bring theories in ecology and evolution into a formal setting should be primarily seen as an attempt to put them in context, to force us to be explicit about our assumptions and see how our ideas interact together [68]. Recent experiences in linguistics has shown how building a knowledge base capable of handling several problems at the same time yielded better results than attacking the problem in isolation because of the problems' interconnectedness [75].
Yet, despite recent progress at the frontier of logic and probability, there are still practical and theoretical issues to overcome to make a large database of knowledge for ecology and evolution possible. Inference can be difficult in rich knowledge representations, not all methods have robust open-source implementations, and some approaches such as Bayesian higher-order probabilistic programming are themselves not well understood. Plus, while mathematicians benefit from decades of experience making large databases of theorems, there has been no such efforts for ecology and evolution. Given the variety of knowledge representations and the complexity of knowledge in ecology and evolution, it will be difficult to know exactly how to built a unified knowledge base without actually trying.
. CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/161125 doi: bioRxiv preprint