The Mathematical Structure of Integrated Information Theory

Integrated Information Theory is one of the leading models of consciousness. It aims to describe both the quality and quantity of the conscious experience of a physical system, such as the brain, in a particular state. In this contribution, we propound the mathematical structure of the theory, separating the essentials from auxiliary formal tools. We provide a definition of a generalized IIT which has IIT 3.0 of Tononi et. al., as well as the Quantum IIT introduced by Zanardi et. al. as special cases. This provides an axiomatic definition of the theory which may serve as the starting point for future formal investigations and as an introduction suitable for researchers with a formal background.


Introduction
Integrated Information Theory (IIT), developed by Giulio Tononi and collaborators, has emerged as one of the leading scientific theories of consciousness [OAT14, MGRT16, TBMK16, MMA + 18, KMBT16]. At the heart of the theory is an algorithm which, based on the level of integration of the internal functional relationships of a physical system in a given state, aims to determine both the quality and quantity ('Φ value') of its conscious experience.
While promising in itself, the mathematical formulation of the theory is not satisfying to date. The presentation in terms of examples and concomitant explanation veils the essential mathematical structure of the theory and impedes philosophical and scientific analysis. In addition, the current definition of the theory can only be applied to quite simple classical physical systems [Bar14], which is problematic if the theory is taken to be a fundamental theory of consciousness, and should eventually be reconciled with our present theories of physics.
To resolve these problems, we examine the essentials of the IIT algorithm and formally define a generalized notion of Integrated Information Theory. This notion captures the inherent mathematical structure of IIT and offers a rigorous mathematical definition of the theory which has 'classical' IIT 3.0 of Tononi et. al. [OAT14,MGRT16, MMA + 18] as well as the more recently introduced Quantum Integrated Information Theory of Zanardi, Tomka and Venuti [ZTV18] as special cases. In addition, this generalization allows us to extend classical IIT, freeing it from a number of simplifying assumptions identified in [BM19].
In the associated article [TK20] we show more generally how the main notions of IIT, including causation and integration, can be treated, and an IIT defined, starting from any suitable theory of physical systems and processes described in terms of category theory. Restricting to classical or quantum process then yields each of the above as special cases. This treatment makes IIT applicable to a large class of physical systems and helps overcome the current restrictions.

Physical systems and states
Spaces and states of conscious experience E Figure 1. An Integrated Information Theory specifies for every system in a particular state its conscious experience, described formally as an element of an experience space. In our formalization, this is a map Sys E −→ Exp from the system class Sys into a class Exp of experience spaces, which, first, sends each system S to its space of possible experiences E(S), and, second, sends each state s ∈ St(S) to the actual experience the system is having when in that space, St(S) → E(S) s → E(S, s) . The definition of this map in terms of axiomatic descriptions of physical systems, experience spaces and further structure used in classical IIT is given in the first half of this paper.
Our definition of IIT may serve as the starting point for further mathematical analysis of IIT, in particular if related to category theory [TTS16,NTS19]. It also provides a simplification and mathematical clarification of the IIT algorithm which extends the technical analysis of the theory [Bar14,Teg15,Teg16] and may contribute to its ongoing critical discussion [Bay18, MSB19, MRCH + 19, TK15]. The concise presentation of IIT in this article should also help to make IIT more easily accessible for mathematicians, physicists and other researchers with a strongly formal background.
1.1. Structure of article. We begin by introducing the necessary ingredients of a generalised Integrated Information Theory in Sections 2 to 4, namely physical systems, spaces of conscious experience and cause-effect repertoires. Our approach is axiomatic in that we state only the precise formal structure which is necessary to apply the IIT algorithm. In Section 5, we introduce a simple formal tool which allows us to present the definition of the algorithm of an IIT in a concise form in Sections 6 and 7. Finally, in Section 8, we summarise the full definition of such a theory.
Following this we give several examples including IIT 3.0 in Section 9 and Quantum IIT in Section 10. In Section 11 we discuss how our formulation allows one to extend classical IIT in several fundamental ways, before discussing further modifications to our approach and other future work in Section 12. Finally, the appendix includes a detailed explanation of how our generalization of IIT coincides with its usual presentation in the case of classical IIT.

Systems
The first step in defining an Integrated Information Theory (IIT) is to specify a class Sys of physical systems to be studied. Each element S ∈ Sys is interpreted as a model of one particular physical system. In order to apply the IIT algorithm, it is only necessary that each element S come with the following features.
Definition 1. A system class Sys is a class each of whose elements S, called systems, come with the following data:

Experience
An IIT aims to specify for each system in a particular state its conscious experience. As such, it will require a mathematical model of such experiences. Examining classical IIT, we find the following basic features of the final experiential states it describes which are needed for its algorithm.
Firstly, each experience e should crucially come with an intensity, given by a number || e || in the non-negative reals R + (including zero). This intensity will finally correspond to the overall intensity of experience, usually denoted by Φ. Next, in order to compare experiences, we require a notion of distance d(e, e ′ ) between any pair of experiences e, e ′ . Finally, the algorithm will require us to be able to rescale any given experience e to have any given intensity. Mathematically, this is most easily encoded by letting us multiply any experience e by any number r ∈ R + . In summary, a minimal model of experience in a generalised IIT is the following.
Definition 2. An experience space is a set E with: 1. an intensity function || . || : E → R + ; 2. a distance function d : for all e ∈ E and r, s ∈ R + .
We remark that this same axiomatisation will apply both to the full space of experiences of a system, as well as to the the spaces describing components of the experiences ('concepts' and 'proto-experiences' defined in later sections). We note that the distance function does not necessarily have to satisfy the axioms of a metric. While this and further natural axioms such as d(r · e, r · f ) = r · d(e, f ) might hold, they are not necessary for the IIT algorithm.
The above definition is very general, and in any specific theory the experiences may come with richer further structure. The following example describes the experience space used in classical IIT.
Example 3. Any metric space (X, d) may be extended to an experience spacē X := X × R + in various ways. E.g., one can define ||(x, r) || = r, r · (x, s) = (x, rs) and define the distance as This is the definition used in classical IIT (cf. Section 9 and Appendix A).
An important operation on experience spaces is taking their product.
Definition 4. For experience spaces E and F , we define the product to be the intensity ||(e, f ) || = max{|| e ||, || f ||} and scalar multiplication r · (e, f ) = (r · e, r · f ). This generalises to any finite product i∈I E i of experience spaces.

Repertoires
In order to define the experience space and individual experiences of a system S, an IIT utilizes basic building blocks called 'repertoires', which we will now define. Next to the specification of a system class, this is the essential data necessary for the IIT algorithm to be applied.
Each repertoire describes a way of 'decomposing' experiences, in the following sense. Let D denote any set with a distinguished element 1, for example the set D S of decompositions of a system S, where the distinguished element is the trivial decomposition 1 ∈ D S . Definition 5. Let e be an element of an experience space E. A decomposition of e over D is a mappingē : D → E withē(1) = e.
In more detail, a repertoire specifies a proto-experience for every pair of subsystems and describes how this experience changes if the subsystems are decomposed. This allows one to assess how integrated the system is with respect to a particular repertoire. Two repertoires are necessary for the IIT algorithm to be applied, together called the cause-effect repertoire.
For subsystems M, P ∈ Sub s (S), define D M,P := D M ×D P . This set describes the decomposition of both subsystems simultaneously. It has a distinguished element 1 = (1 M , 1 P ).
Definition 6. A cause-effect repertoire at S is given by a choice of experience space PE(S), called the space of proto-experiences, and for each s ∈ St(S) and M, P ∈ Sub s (S), a pair of elements and for each of them a decomposition over D M,P .
Examples of cause-effect repertoires will be given in Sections 9 and 10. A general definition in terms of process theories is given in [TK20]. For the IIT algorithm, a cause-effect repertoire needs to be specified for every system S, as in the following definition.
Definition 7. A cause-effect structure is a specification of a cause-effect repertoire for every S ∈ Sys such that The names 'cause' and 'effect' highlight that the definitions of caus s (M, P ) and eff s (M, P ) in classical and quantum IIT describe the causal dynamics of the system. More precisely, they are intended to capture the manner in which the 'current' state s of the system, when restricted to M , constrains the 'previous' or 'next' state of P , respectively.

Integration
We have now introduced all of the data required to define an IIT; namely, a system class along with a cause-effect structure. From this, we will give an algorithm aiming to specify the conscious experience of a system. Before proceeding to do so, we introduce a conceptual short-cut which allows the algorithm to be stated in a concise form. This captures the core ingredient of an IIT, namely the computation of how integrated an entity is. Finally, the integration scaling of a pair e 1 , e 2 of such elements is the pair ι(e 1 , e 2 ) := (φ ·ê 1 , φ ·ê 2 ) where φ := min(φ(e 1 ), φ(e 2 )) is the minimum of their integration levels.
We will also need to consider indexed collections of decomposable elements. Let S be a system in a state s ∈ St(S) and assume that for every M ∈ Sub s (S) an element e M of some experience space E M with a decomposition over some D M is given. We call (e M ) M∈Subs(S) a collection of decomposable elements, and denote it as (e M ) M .

Constructions -Mechanism Level
Let S ∈ Sys be a physical system whose experience in a state s ∈ St(S) is to be determined. The first level of the algorithm involves fixing some subsystem M ∈ Sub s (S), referred to as a 'mechanism', and associating to it an object called its 'concept' which belongs to the concept space The concept of M is then defined as the core integration scaling of this pair of collections, C S,s (M ) := Core integration scaling of (caus s (M ), eff s (M )) . (10) It is an element of C(S). Unravelling our definitions, the concept thus consists of the values of the cause and effect repertoires at their respective 'core' purviews P c , P e , i.e. those which make them 'most integrated'. These values caus(M, P c ) and eff(M, P e ) are then each rescaled to have intensity given by the minima of their two integration levels.

Constructions -System Level
The second level of the algorithm specifies the experience of the system S in state s. To this end, all concepts of a system are collected to form its Q-shape, defined as Q s (S) := (C S,s (M )) M∈Subs(S) .
(11) This is an element of the space where n(S) := |Sub s (S)|, which is finite and independent of the state s according to our assumptions. We can also define a Q-shape for any cut of S. Let z ∈ D S be a decomposition, S z the corresponding cut system and s z be the corresponding cut state. We define Because of (4), and since the number of subsystems remains the same when cutting, Q s (S z ) is also an element of E(S). This gives a map which is a decomposition of Q s (S) over D S . Considering this map for every subsystem of S gives a collection of decompositions defined as The definition implies that E(S, s) ∈ E(M ), where M ∈ Sub s (S) is the core of the collection Q(S, s), called the major complex. It describes which part of the system S is actually conscious. In most cases there will be a natural embedding E(M ) ֒→ E(S) for a subsystem M of S, allowing us to view E(S, s) as an element of E(S) itself. Assuming this embedding to exist allows us to define an Integrated Information Theory concisely in the next section.

Integrated Information Theories
We can now summarise all that we have said about IITs.
Definition 11. An Integrated Information Theory is determined as follows. The data of the theory is a system class Sys along with a cause-effect structure. The theory then gives a mapping

Sys
Exp into the class Exp of all experience spaces, sending each system S to its space of experiences E(S) defined in (12), and a mapping which determines the experience of the system when in a state s, defined in (14). The quantity of the system's experience is given by and the quality of the system's experience is given by the normalized experiencê E(S, s). The experience is located in the core of the collection Q(S, s), called major complex, which is a subsystem of S.
In the next sections we specify the data of several example IITs.

Classical IIT
In this section we show how IIT 3.0 [MMA + 18, MGRT16, Ton15, OAT14] fits in into the framework developed here. A detailed explanation of how our earlier algorithm fits with the usual presentation of IIT is given in Appendix A. In [TK20] we give an alternative categorical presentation of the theory. 9.1. Systems. We first describe the system class underlying classical IIT. Physical systems S are considered to be built up of several components S 1 , . . . , S n , called elements. Each element S i comes with a finite set of states St(S i ), equipped with a metric. A state of S is given by specifying a state of each element, so that We define a metric d on St(S) by summing over the metrics of the element state spaces St(S i ) and denote the collection of probability distributions over St(S) by P(S). Note that we may view St(S) as a subset of P(S) by identifying any s ∈ St(S) with its Dirac distribution δ s ∈ P(S), which is why we abbreviate δ s by s occasionally in what follows. Additionally, each system comes with a probabilistic (discrete) time evolution operator or transition probability matrix, sending each s ∈ St(S) to a probabilistic state T (s) ∈ P(S). Equivalently it may be described as a convex-linear map Furthermore, the evolution T is required to satisfy a property called conditional independence, which we define shortly.
where p · p ′ denotes the multiplication of these probability distributions to give a probability distribution over S. Next, the marginal of T over M is defined as the map such that for each p ∈ P(S) and m 2 ∈ St(M ⊥ ) we have In particular we write T i := S ⊥ i |T for each i = 1, . . . , n. Conditional independence of T may now be defined as the requirement that where the right-hand side is again a probability distribution over St(S).
where ω M ∈ P(M ) denotes the uniform distribution on St(M ). This is interpreted in the graph depiction as removing all those edges from the graph whose source is in M ⊥ and whose target is in M . The corresponding input of the target element is replaced by noise, i.e. the uniform probability distribution over the source element.
9.4. Proto-Experiences. For each system S, the first Wasserstein metric (or 'Earth Mover's Distance') makes P(S) a metric space (P(S), d). The space of proto-experiences of classical IIT is where P(S) is defined in Example 3. Thus elements of PE(S) are of the form (p, r) for some p ∈ P(S) and r ∈ R + , with distance function, intensity and scalar multiplication as defined in the example.
9.5. Repertoires. It remains to define the cause-effect repertoires. Fixing a state s of S, the first step will be to define maps caus ′ s and eff ′ s which send any choice of (M, P ) ∈ Sub(S) × Sub(S) to an element of P(P ). These should describe the way in which the current state of M constrains that of P in the next or previous time-steps. We begin with the effect repertoire. For a single element purview P i we define where s M denotes (the Dirac distribution of) the restriction of the state s to M . While it is natural to use the same definition for arbitrary purviews, IIT 3.0 in fact uses another definition based on consideration of 'virtual elements' [MMA + 18, MGRT16,Ton15], which also makes calculations more efficient [MMA + 18, Supplement S1]. For general purviews P , this definition is taking the product over all elements P i in the purview P . Next, for the cause repertoire, for a single element mechanism M i and eachs ∈ St(P ), we define where λ is the unique normalisation scalar making caus ′ s (M i , P ) a valid element of P(P ). Here, for clarity, we have indicated evaluation of probability distributions at particular states by square brackets. If the time evolution operator has an inverse T −1 , this cause repertoire could be defined similarly to (25) by caus ′ s (M i , P ) = , but classical IIT does not utilize this definition. For general mechanisms M , we then define where the product is over all elements M i in M and where κ ∈ R + is again a normalisation constant. We may at last now define with intensity 1 when viewed as elements of PE(S). Here, the dot indicates again the multiplication of probability distributions and ∅ denotes the empty mechanism. The distributions caus ′ s (∅, P ⊥ ) and eff ′ s (∅, P ⊥ ) are called the unconstrained cause and effect repertoires over P ⊥ .
Remark 12. It is in fact possible for the right-hand side of (27) to be equal to 0 for alls for some M i ∈ M . In this case we set caus s (M, P ) = (ω S , 0) in PE(S).
Finally we must specify the decompositions of these elements over D M,P . For any partitions z M = (M 1 , M 2 ) of M and z P = (P 1 , P 2 ) of P , we define where we have abused notation by equating each subset M 1 and M 2 of nodes with their induced subsystems of S via the state s.
This concludes all data necessary to define classical IIT. If the generalized definition of Section 8 is applied to this data, it yields precisely classical IIT 3.0 defined by Tononi et al. In Appendix A, we explain in detail how our definition of IIT, equipped with this data, maps to the usual presentation of the theory.

Quantum IIT
In this section, we consider quantum IIT defined in [ZTV18]. This is also a special case of the definition in terms of process theories we give in [TK20].

Repertoires.
We finally come to the definition of the cause-and effect repertoire. Unlike classical IIT, the definition in [ZTV18] does not consider virtual elements. Let a system S in state s ∈ St(S) be given. As in Section 9.5, we utilize maps caus ′ s and eff ′ s which here map subsystems M and P to St(P ). They are defined as

Extensions of Classical IIT
The physical systems to which IIT 3.0 may be applied are limited in a number of ways: they must have a discrete time-evolution, satisfy Markovian dynamics and exhibit a discrete set of states [BM19]. Since many physical systems do not satisfy these requirements, if IIT is to be taken as a fundamental theory about reality, it must be extended to overcome these limitations.
In this section, we show how IIT can be redefined to cope with continuous time, non-Markovian dynamics and non-compact state spaces, by a redefinition of the maps (26) and (28) and, in the case of non-compact state spaces, a slightly different choice of (24), while leaving all of the remaining structure as it is. While we do not think that our particular definitions are satisfying as a general definition of IIT, these results show that the disentanglement of the essential mathematical structure of IIT from auxiliary tools (the particular definition of cause-effect repertoires used to date) can help to overcome fundamental mathematical or conceptual problems.
In Section 11.3, we also explain which solution to the problem of non-canonical metrics is suggested by our formalism. 11.1. Discrete Time and Markovian Dynamics. In order to avoid the requirement of a discrete time and Markovian dynamics, instead of working with the time evolution operator (18), we define the cause-and effect repertoires in reference to a given trajectory of a physical state s ∈ St(S). The resulting definitions can be applied independently of whether trajectories are being determined by Markovian dynamics in a particular application, or not.
Let t ∈ I denote the time parameter of a physical system. If time is discrete, I is an ordered set. If time is continuous, I is an interval of reals. For simplicity, we assume 0 ∈ I. In the deterministic case, a trajectory of a state s ∈ St(S) is simply a curve in St(S), which we denote by (s(t)) t∈I with s(0) = s. For probabilistic systems (such as neural networks with a probabilistic update rule), it is a curve of probability distributions P(S), which we denote by (p(t)) t∈I , with p(0) equal to the Dirac distribution δ s . The latter case includes the former, again via Dirac distributions.
In what follows, we utilize the fact that in physics, state spaces are defined such that the dynamical laws of a system allow to determine the trajectory of each state. Thus for every s ∈ St(S), there is a trajectory (p s (t)) t∈I which describes the time evolution of s.
The idea behind the following is to define, for every M, P ∈ Sub(S), a trajectory p (P,M) s (t) in P(P ) which quantifies how much the state of the purview P at time t is being constrained by imposing the state s at time t = 0 on the mechanism M . This gives an alternative definition of the maps (26) and (28), while the rest of classical IIT can be applied as before.
Let now M, P ∈ Sub(S) and s ∈ St(S) be given. We first consider the time evolution of the state (s M , v) ∈ St(S), where s M denotes the restriction of s to St(M ) as before and where v ∈ St(M ⊥ ) is an arbitrary state of M ⊥ . We denote the time evolution of this state by p (sM ,v) (t) ∈ P(S). Marginalizing this distribution over P ⊥ gives a distribution on the states of P , which we denote as p P (sM ,v) (t) ∈ P(P ). Finally, we average over v using the uniform distribution ω M ⊥ . Because state spaces are finite in classical IIT, this averaging can be defined pointwise for every w ∈ St(P ) by where κ is the unique normalization constant which ensures that p (P,M) s (t) ∈ P(P ). The probability distribution p (P,M) s (t) ∈ P(P ) describes how much the state of the purview P at time t is being constrained by imposing the state s on M at time t = 0 as desired. Thus, for every t ∈ I, we have obtained a mapping of two subsystems M, P to an element p to replace (28). The remainder of the definitions of classical IIT can then be applied as before.
11.2. Discrete Set of States. The problem with applying the definitions of classical IIT to systems with continuous state spaces (e.g. neuron membrane potentials [BM19]) is that in certain cases, uniform probability distributions do not exist. E.g., if the state space of a system S consists of the positive real numbers R + , no uniform distribution can be defined which has a finite total volume, so that no uniform probability distribution ω S exists. It is important to note that this problem is less universal than one might think. E.g., if the state space of the system is a closed and bounded subset of R + , e.g. an interval [a, b] ⊂ R + , a uniform probability distribution can be defined using measure theory, which is in fact the natural mathematical language for probabilities and random variables. Nevertheless, the observation in [BM19] is correct that if a system has a non-compact continuous state space, ω S might not exist, which can be considered a problem w.r.t. the above-mentioned working hypothesis.
This problem can be resolved for all well-understood physical systems by replacing the uniform probability distribution ω S by some other mathematical entity which allows to define a notion of averaging states. An example is quantum theory (Section 10), whose state-spaces are continuous and non-compact. Here, the maximally mixed state S plays the role of the uniform probability distribution. For all relevant classical systems with non-compact state spaces (whether continuous or not), the same is true: There exists a canonical uniform measure µ S which allows to define the cause-effect repertoires similar to the last section, as we now explain. Examples for this canonical uniform measure are the Lebesgue measure for subsets of R n [Rud06], or the Haar measure for locally compact topological groups [Sal16] such as Lie-groups.
In what follows, we explain how the construction of the last section needs to be modified in order to be applied to this case. In all relevant classical physical theories, St(S) is a metric space in which every probability measure is a Radon measure, in particular locally finite, and where a canonical locally finite uniform measure µ S exists. We define P 1 (S) to be the space of probability measures whose first moment is finite. For these, the first Wasserstein metric (or 'Earth Mover's Distance') W 1 exists, so tat (P 1 (S), W 1 ) is a metric space.
As before, the dynamical laws of the physical systems determine for every state s ∈ St(S) a time evolution p s (t), which here is an element of P 1 (S). Integration of this probability measure over St(P ⊥ ) yields the marginal probability measure p P s (t). As in the last section, we may consider these probability measures for the Since µ S is not normalizable, we cannot define p (P,M) s (t) as in (31), for the result might be infinite. Using the fact that µ S is locally finite, we may, however, define a somewhat weaker equivalent. To this end, we note that for every state s M ⊥ , the local finiteness of µ M ⊥ implies that there is a neighbourhood N s,M ⊥ in St(M ⊥ ) for which µ M ⊥ (N s,M ⊥ ) is finite. We choose a sufficiently large neighbourhood which satisfies this condition. Assuming p P (sM ,v) (t) to be a measurable function in v, for every A in the σ-algebra of St(M ⊥ ), we can thus define which is a finite quantity. The p (P,M) s (t) so defined is non-negative, vanishes for A = ∅ and satisfies countable additivity. Hence it is a measure on St(P ) as desired, but might not be normalizable.
All that remains for this to give a cause-effect repertoire as in the last section, is to make sure that any measure (normalized or not) is an element of PE(S). The theory is flexible enough to do this by setting d(µ, ν) = |µ − ν|(St(P )) if either µ or ν is not in P 1 (S), and W 1 (µ, ν) otherwise. Here, |µ − ν| denotes the total variation of the signed measure µ− ν, and |µ− ν|(St(P )) is the volume thereof [oM13,Hal74]. and finally allows to construct cause-effect repertoires as in the last section.
11.3. Non-Canonical Metrics. Another criticism of IIT's mathematical structure mentioned [BM19] is that the metrics used in IIT's algorithm are, to a certain extend, chosen arbitrarily. Different choices indeed imply different results of the algorithm, both concerning the quantity and quality of experience, which can be considered problematic.
The resolution of this problem is, however, not so much a technical as a conceptual or philosophical task, for what is needed to resolve this issue is a justification of why a particular metric should be used. Various justifications are conceivable, e.g. identification of desired behaviour of the algorithm when applied to simple systems. When considering our mathematical reconstruction of the theory, the following natural justification offers itself. Implicit in our definition of the theory as a map from systems to experience spaces is the idea that the mathematical structure of experiences spaces (Definition 2) reflects the phenomenological structure of experience. This is so, most crucially, for the distance function d, which describes how similar two elements of experience spaces are. Since every element of an experience space corresponds to a conscious experience, it is naturally to demand that the similarly of the two mathematical objects should reflect the similarity of the experiences they describe. Put differently, the distance function d of an experience space should in fact mirror (or "model") the similarity of conscious experiences as experienced by an experiencing subject.
This suggests that the metrics d used in the IIT algorithm should, ultimately, be defined in terms of the phenomenological structure of similarity of conscious experiences. For the case of colour qualia, this is in fact feasible [Kle19, Example 3.18], [Kue10,SWD04]. In general, the mathematical structure of experience spaces should be intimately tied to the phenomenology of experience, in our eyes.

Summary & Outlook
In this article, we have propounded the mathematical structure of Integrated Information Theory. First, we have studied which exact structures the IIT algorithm uses in the mathematical description of physical systems, on the one hand, and in the mathematical description of conscious experience, on the other. Our findings are the basis of definitions of a physical system class Sys and a class Exp of experience spaces, and allowed us to view IIT as a map Sys → Exp.
Next, we needed to disentangle the essential mathematics of the theory from auxiliary formal tools used in the contemporary definition. To this end, we have introduced the precise notion of decomposition of elements of an experience space required by the IIT algorithm. The pivotal cause-effect repertoires are examples of decompositions so defined, which allowed us to view any particular choice, e.g. the one of 'classical' IIT developed by Tononi et. al., or the one of 'quantum' IIT recently introduced by Zanardi et. al. as data provided to a general IIT algorithm.
The formalization of cause-effect repertoires in terms of decompositions then led us to define the essential ingredients of IIT's algorithm concisely in terms of integration levels, integration scalings and cores. These definitions describe and unify recurrent mathematical operations in the contemporary presentation, and finally allowed to define IIT completely in terms of a few lines of definition.
Throughout the paper, we have taken great care to make sure our definitions reproduce exactly the contemporary version of IIT 3.0. The result of our work is a mathematically rigorous and general definition of Integrated Information Theory. This definition can be applied to any meaningful notion of systems and causeeffect repertoires, and we have shown that this allows to overcome most of the mathematical problems of the contemporary definition identified to date in the literature.
We believe that our mathematical reconstruction of the theory can be the basis for refined mathematical and philosophical analysis of IIT. We also hope that this mathematisation may make the theory more amenable to study by mathematicians, physicists, computer scientists and other researchers with a strongly formal background.
12.1. Process Theories. Our generalization of IIT is axiomatic in the sense that we have only included those formal structures in the definition which are necessary for the IIT algorithm to be applied. This ensured that our reconstruction is as general as possible, while still true to IIT 3.0. As a result, several notions used in classical IIT, e.g., system decomposition, subsystems or causation, are merely defined abstractly at first, without any reference to the usual interpretation of these concepts in physics.
In the related article [TK20], we show that these concepts can be meaningfully defined in any suitable process theory of physics, formulated in the language of symmetric monoidal categories. This approach can describe both classical and quantum IIT and yields a complete formulation of contemporary IIT in a categorical framework.
12.2. Further Development of IIT. IIT is constantly under development, with new and refined definitions being added every few years. We hope that our mathematical analysis of the theory might help to contribute to this development. E.g., the working hypothesis that IIT is a fundamental theory, i.e. describes reality as it is, implies that technical problems of the theory need to be resolved. We have shown that our formalization allows address the technical problems mentioned in the literature. However, there are others which we have not addressed in this paper.
Most crucially, the IIT algorithm uses a series of maximalization and minimalization operations, unified in the notion of core subsystems in our formalization. In general, there is no guarantee that these operations lead to unique results, neither in classical nor quantum IIT. Using different cores has major impact on the output of the algorithm, including the Φ value, which is a case of ill-definedness. 2 Furthermore, the contemporary definition of IIT as well as our formalization rely on there being a finite number of subsystems of each system, which might not be the case in reality. Our formalisation may be extendable to the infinite case by assuming that every system has a fixed but potentially infinite indexing set Sub(S), so that each Sub s (S) is the image of a mapping Sub(S) × St(S) → Sys, but we have not considered this in detail in this paper.
Finally, concerning more operational questions, it would be desirably to develop the connection to empirical measures such as the Perturbational Complexity Index PCI [CCR + 16, CGR + 13] in more detail, as well as to define a controlled approximation of the theory whose calculation is less expensive. Both of these tasks may be achievable by substituting parts of our formalization with simpler mathematical structure.
On the conceptual side of things, it would be desirable to have a more proper understanding of how the mathematical structure of experiences spaces corresponds to the phenomenology of experience, both for the general definition used in our formalization and the specific definitions used in classical and quantum IIT. In particular, it would be desirable to understand how it relates to the important notion of qualia, which is often asserted to have characteristic features such as ineffability, intrinsicality, non-contextuality, transparency or homogeneity [Met06]. For a first analysis towards this goal, cf. [Kle19].
A.3. Algorithm -Mechanism Level. Next, we explicitly unpack our form of the IIT algorithm to see how it compares in the case of classical IIT with [OAT14]. In our formalism, the integrated information ϕ of a mechanism M of system S when in state s is defined in Equation (10). This definition conjoins several steps in the definition of classical IIT. To explain why it corresponds exactly to classical IIT, we disentangle this definition step by step. First, consider caus s (M, P ) in Equation (9). This is, by definition, a decomposition map. The calculation of the integration level of this decomposition map, cf. Equation (5), amounts to comparing caus s (M, P ) to the cause-effect repertoire associated with every decomposition using the metric of the target space PE(S), which for classical IIT is defined in (24) and Example 3, so that the metric d used for comparison is indeed the Earth Mover's Distance. Since cause-effect repertoires have, by definition, unit intensity, the factor r in the definition (1) of the metric does not play a role at this stage. Therefore, the integration level of caus s (M, P ) is exactly the integrated cause information, denoted as ϕ MIP cause (y t , Z t−1 ) in [Ton15], where y t denotes the (induced state of the) mechanism M in this notation, and Z t−1 denotes the purview P . Similarly, the integration level of eff s (M, P ) is exactly the integrated effect information, denoted as ϕ MIP effect (y t , Z t+1 ) . The integration scaling in (10) simply changes the intensity of an element of PE(S) to match the integration level, using the scalar multiplication, which is important for the system level definitions. When applied to caus s (M, P ), this would result in an element of PE(S) whose intensity is precisely ϕ MIP cause (y t , Z t−1 ). Consider now the collections (9) of decomposition maps. Applying Definition 9, the core of caus s (M ) is that purview P which gives the decomposition caus s (M, P ) with the highest integration level, i.e. with the highest ϕ MIP cause (y t , Z t−1 ). This is called the core cause P c of M , and similarly the core of eff s (M ) is called the core effect P e of M .
Finally, to fully account for (10), we note that the integration scaling of a pair of decomposition maps rescales both elements to the minimum of the two integration levels. Hence the integration scaling of the pair (caus s (M, P ), eff(M, P ′ )) fixes the scalar value of both elements to be exactly the integrated information, denoted as ϕ(y t , Z t±1 ) = min ϕ MIP cause , ϕ MIP effect in [Ton15], where P = Z t+1 and P ′ = Z t−1 . In summary, the following operations are combined in Equation (10). The core of (caus s (M ), eff s (M )) picks out the core cause P c and core effect P e . The core integration scaling subsequently considers the pair (caus s (M, P c ), eff(M, P e )), called maximally irreducible cause-effect repertoire, and determines the integration level of each by analysing the behaviour with respect to decompositions. Finally, it rescales both to the minimum of the integration levels. Thus it gives exactly what is called ϕ max in [Ton15]. Using, finally, the definition of the intensity of the product PE(S) × PE(S) in Definition 4, this implies (39). The concept M in our formalization is given by the tuple i.e. the pair of maximally irreducible repertoires scaled by ϕ max (M ). This is equivalent to what is called a concept, or sometimes quale sensu stricto, in classcial IIT [Ton15], and denoted as q(y t ).
We finally remark that it is also possible in classical IIT that a cause repertoire value caus s (M, P ) vanishes (Remark 12). In our formalization, it would hence be represented by (ω S , 0) in PE(S), so that d(caus s (M, P ), q) = 0 for all q ∈ E(S) according to (1), which certainly ensures that ϕ MIP cause (M, P ) = 0.
A.4. Algorithm -System Level. We finally explain how the system level definitions correspond to the usual definition of classical IIT. The Q-shape Q s (S) is the collection of all concepts specified by the mechanisms of a system. Since each concept has intensity given by the corresponding integrated information of the mechanism, this makes Q s (S) what is usually called the conceptual structure or cause-effect structure. In [OAT14], one does not include a concept for any mechanism M with ϕ max (M ) = 0. This manual exclusion is unnecessary in our case because the mathematical structure of experience spaces implies that mechanisms with ϕ max (M ) = 0 should be interpreted as having no conscious experience, and the algorithm in fact implies that they have 'no effect'. Indeed we will now see that they do not contribute to the distances in E(S) or any Φ values, and so we do not manually exclude them.
When comparing Q s (S) with the Q-shape (13) obtained after replacing S by any of its cuts, it is important to note that both are elements of E(S) defined in (12), which is a product of experience spaces. According to Definition 4, the distance function on this product is where ϕ max (M ) denotes the integrated information of the concept in the original system S, and where the right-hand cause and effect repertoires are those of S z at its own core causes and effects for M . The factor ϕ max (M ) ensures that the distance used here corresponds precisely to the distance used in [OAT14], there called the extended Earth Mover's Distance. If the integrated information ϕ max (M ) of a mechanism is non-zero, it follows that d(C S,s (M ), C S z ,s z (M )) = 0 as mentioned above, so that this concept does not contribute. We remark that in [MMA + 18, S1], an additional step is mentioned which is not described in any of the other papers we consider. Namely, if the integrated information of a mechanism is non-zero before cutting but zero after cutting, what is compared is not the distance of the corresponding concepts as in (40), but in fact the distance of the original concept with a special null concept, defined to be the unconstrained repertoire of the cut system. We have not included this step in our definitions, but it could be included by adding a choice of distinguished points to Example 3 and redefining the metric correspondingly.
In Equation (14) the above comparison is being conducted for every subsystem of a system S. The subsystems of S are what is called candidate systems in [OAT14], and which describe that 'part' of the system that is going to be conscious according to the theory (cf. below). Crucially, candidate systems are subsystems of S, whose time evolution is defined in (22). This definition ensures that the state of the elements of S which are not part of the candidate system are fixed in their current state, i.e. constitute background conditions as required in the contemporary version of classcial IIT [MMA + 18].
Equation (14) then compares the Q-shape of every candidate system to the Qshape of all of its cuts, using the distance function described above, where the cuts are defined in (23). The cut system with the smallest distance gives the systemlevel minimum information partition and the integrated (conceptual) information of that candidate system, denoted as Φ(x t ) in [Ton15].
The core integration scaling finally picks out that candidate system with the largest integrated information value. This candidate system is the major complex M of S, the part of S which is conscious according to the theory as part of the exclusion postulate of IIT. Its Q-shape is the maximally irreducible conceptual structure (MICS), also called quale sensu lato. The overall integrated conceptual information is, finally, simply the intensity of E(S, s) as defined in (14) This encodes the Q-shape Q m (M ), i.e. the maximally irreducible conceptual structure of the major complex, sometimes called quale sensu lato, which is taken to describe the quality of conscious experience. By construction it also encodes the integrated conceptual information of the major complex, which captures its intensity, since we have || E(S, s) || = Φ(M, m). The rescaling of Q m (M ) in (41) leaves the relative intensities of the concepts in the MICS intact. Thus E(S, s) is the constellation of concepts in qualia space E(M ) of [OAT14].