Compositional data such as proportions, concentrations, counting and fluxes of properties are the data most commonly collected in ecology and agronomy. However, their statistical analysis is potentially biased, often leading to conflicting interpretations. Indeed, compositional data are strictly positive, multivariate and interrelated data that are constrained to some whole such as ecosystems, soil-plant-animal subsystems, animal diets, and the dry mass of a tissue or a soil sample. System closure conveys special properties and spurious correlations due to redundancy (one component being computable by difference between whole and the sum of others) and sub-compositional incoherence (results depend on measurement scale, e.g., closure to 100% being total element or the mineral, organic, wet or dry mass for tissue or soil samples). Matrix rank of D-parts compositions is (D-1), i.e. there are (D-1) degrees of freedom in a closed system. Because the distribution of compositional data is logistic normal in inferential statistics, confidence intervals cannot range beyond the compositional space (below zero or above 100%).
Compositional data analysis (CoDa) provides tools to avoid methodological biases when analyzing compositional data. The most promising one is the orthonormal balance computed as an isometric log ratio (ilr). The ilrs provide (D-1) linearly independent variables from the D raw data of a compositional vector. The concept can be illustrated by a mobile with hierarchically arranged balances that describe the structure and functions of living systems. The ilrs are computed at fulcrums while proportions, concentrations and counts lay below in the buckets. Balances can be projected into the Euclidean space as Cartesian coordinates where distances between initial, referential or expected system states can be calculated to monitor change of state. Ad hoc orthonormal balances can address stoichiometric rules, ionomics, animal feeding ratio rules, and dual ratios in soil, water, and plant systems. Many equations in soil physics could be reformulated to account for the compositional nature of air, water and solid volumes and describe fluxes of matter and energy through the soil. Indices used to synthesize raw data in the pre-compositional era could be revisited using CoDa tools.
The aim of this Research Topic is to trigger new ecological and agronomic thinking about the intrinsic multivariate nature and the inherent structure of compositional data. We hope that this Research Topic will facilitate the development of more consistent interpretations of the results from laboriously collected data and the elaboration of better-built theories in ecology and agronomy. This change of paradigm is a must considering increasingly accessible high-quality big data to support evidence-based ecology and agriculture with the best multivariate tools available.
Compositional data such as proportions, concentrations, counting and fluxes of properties are the data most commonly collected in ecology and agronomy. However, their statistical analysis is potentially biased, often leading to conflicting interpretations. Indeed, compositional data are strictly positive, multivariate and interrelated data that are constrained to some whole such as ecosystems, soil-plant-animal subsystems, animal diets, and the dry mass of a tissue or a soil sample. System closure conveys special properties and spurious correlations due to redundancy (one component being computable by difference between whole and the sum of others) and sub-compositional incoherence (results depend on measurement scale, e.g., closure to 100% being total element or the mineral, organic, wet or dry mass for tissue or soil samples). Matrix rank of D-parts compositions is (D-1), i.e. there are (D-1) degrees of freedom in a closed system. Because the distribution of compositional data is logistic normal in inferential statistics, confidence intervals cannot range beyond the compositional space (below zero or above 100%).
Compositional data analysis (CoDa) provides tools to avoid methodological biases when analyzing compositional data. The most promising one is the orthonormal balance computed as an isometric log ratio (ilr). The ilrs provide (D-1) linearly independent variables from the D raw data of a compositional vector. The concept can be illustrated by a mobile with hierarchically arranged balances that describe the structure and functions of living systems. The ilrs are computed at fulcrums while proportions, concentrations and counts lay below in the buckets. Balances can be projected into the Euclidean space as Cartesian coordinates where distances between initial, referential or expected system states can be calculated to monitor change of state. Ad hoc orthonormal balances can address stoichiometric rules, ionomics, animal feeding ratio rules, and dual ratios in soil, water, and plant systems. Many equations in soil physics could be reformulated to account for the compositional nature of air, water and solid volumes and describe fluxes of matter and energy through the soil. Indices used to synthesize raw data in the pre-compositional era could be revisited using CoDa tools.
The aim of this Research Topic is to trigger new ecological and agronomic thinking about the intrinsic multivariate nature and the inherent structure of compositional data. We hope that this Research Topic will facilitate the development of more consistent interpretations of the results from laboriously collected data and the elaboration of better-built theories in ecology and agronomy. This change of paradigm is a must considering increasingly accessible high-quality big data to support evidence-based ecology and agriculture with the best multivariate tools available.