Composition Statistics (composition_stats)

This module provides functions for compositional data analysis.

Many ‘omics datasets are inherently compositional - meaning that they are best interpreted as proportions or percentages rather than absolute counts.

Formally, \(x\) is a composition if \(\sum_{i=0}^D x_{i} = c\) and \(x_{i} > 0\), \(1 \leq i \leq D\) and \(c\) is a real valued constant and there are \(D\) components for each composition. In this module \(c=1\). Compositional data can be analyzed using Aitchison geometry. [1]

However, in this framework, standard real Euclidean operations such as addition and multiplication no longer apply. Only operations such as perturbation and power can be used to manipulate this data.

This module allows two styles of manipulation of compositional data. Compositional data can be analyzed using perturbation and power operations, which can be useful for simulation studies. The alternative strategy is to transform compositional data into the real space. Right now, the centre log ratio transform (clr) and the isometric log ratio transform (ilr) [2] can be used to accomplish this. This transform can be useful for performing standard statistical tools such as parametric hypothesis testing, regressions and more.

The major caveat of using this framework is dealing with zeros. In the Aitchison geometry, only compositions with nonzero components can be considered. The multiplicative replacement technique [3] can be used to substitute these zeros with small pseudocounts without introducing major distortions to the data.

Functions

closure(mat, *[, out])

Performs closure to ensure that all elements add up to 1.

multiplicative_replacement(mat[, delta])

Replace all zeros with small non-zero values

perturb(x, y)

Performs the perturbation operation.

perturb_inv(x, y)

Performs the inverse perturbation operation.

power(x, a)

Performs the power operation.

inner(x, y)

Calculates the Aitchson inner product.

clr(mat)

Performs centre log ratio transformation.

clr_inv(mat)

Performs inverse centre log ratio transformation.

ilr(mat[, basis, check])

Performs isometric log ratio transformation.

ilr_inv(mat[, basis, check])

Performs inverse isometric log ratio transform.

alr(mat[, denominator_idx])

Performs additive log ratio transformation.

alr_inv(mat[, denominator_idx])

Performs inverse additive log ratio transform.

center(mat)

Compute the geometric average of data.

centralize(mat)

Center data around its geometric average.

sbp_basis(sbp)

Builds an orthogonal basis from a sequential binary partition (SBP).

References

1

V. Pawlowsky-Glahn, J. J. Egozcue, R. Tolosana-Delgado (2015), Modeling and Analysis of Compositional Data, Wiley, Chichester, UK

2

J. J. Egozcue., “Isometric Logratio Transformations for Compositional Data Analysis” Mathematical Geology, 35.3 (2003)

3

J. A. Martin-Fernandez, “Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation”, Mathematical Geology, 35.3 (2003)

Examples

>>> import numpy as np

Consider a very simple environment with only 3 species. The species in the environment are equally distributed and their proportions are equivalent:

>>> otus = np.array([1./3, 1./3., 1./3])

Suppose that an antibiotic kills off half of the population for the first two species, but doesn’t harm the third species. Then the perturbation vector, after closure, would be as follows:

>>> antibiotic = closure(np.array([1./2, 1./2, 1]))

And the resulting perturbation would be

>>> perturb(otus, antibiotic)
array([ 0.25,  0.25,  0.5 ])