utlvce.Model

The utlvce.Model class is a representation of a linear Gaussian structural causal model with latent effects. It is used throughout the code, e.g. in the alternating optimization procedure implemented in utlvce.score and to generate synthetic data (see utlvce.Model.sample()).

The class also implements the __str__ method; calling print(model) will return a human-readable representation of the model parameters and the assumption deviation metrics.

class utlvce.Model(A, B, gamma, omegas, psis)

The utlvce.Model class holds the parameters of the model and offers additional functionality such as checking deviation from assumptions or generating intermediate quantities used in the alternating optimization procedure. It also allows generating data according to the model (see sample() below).

It defines the following parameters:

Parameters

p (int) – The number of observed variables in the model.
l (int) – The number of latent variables in the model.
e (int) – The number of environments in the model.
A (numpy.ndarray) – The p x p adjacency matrix of the DAG underlying the model, where A[i,j] != 0 implies i -> j.
B (numpy.ndarray) – The p x p connectivity (edge weights) matrix. Follows the sparsity pattern of A.
gamma (numpy.ndarray) – The l x p matrix of latent effects, i.e. connectivity matrix from latent to observed variables, where gamma[i,j] != 0 implies i -> j.
omegas (numpy.ndarray) – The e x p matrix containing the variances of the observed variables’ noise terms.
psis (numpy.ndarray) – The e x l array with the variances of the latent variables for each environment.

__init__(A, B, gamma, omegas, psis)

Create a new instance of a model.

Parameters

A (numpy.ndarray) – The p x p adjacency matrix of the given DAG, where A[i,j] != 0 implies i -> j.
B (numpy.ndarray) – The p x p connectivity (weight) matrix.
gamma (numpy.ndarray) – The l x p matrix of latent effects, i.e. connectivity matrix from latent to observed variables, where gamma[i,j] != 0 implies i -> j.
omegas (numpy.ndarray) – A e x p matrix containing containing the variances of the observed variables’ noise terms.
psis (numpy.ndarray) – A e x l array with the variances of the latent variables for each environment.

Return type

NoneType

Raises

ValueError : – If A is not a DAG adjacency or if B does not respect the sparsity pattern in A; if the dimensions of the different parameters are not compatible.

Examples

Creating an instance of a model with 3 observed variables, 2 latents and 5 environments.

>>> rng = np.random.default_rng(42)
>>> A = np.array([[0,0,1], [0,0,1], [0,0,0]])
>>> B = np.array([[0,0,0.5], [0,0,3], [0,0,0]])
>>> gamma = rng.uniform(size=(2,3))
>>> omegas = rng.uniform(size=(5,3))
>>> psis = rng.uniform(size=(5,2))
>>> model = Model(A, B, gamma, omegas, psis)

>>> model.A
array([[0, 0, 1],
       [0, 0, 1],
       [0, 0, 0]])

>>> model.B
array([[0. , 0. , 0.5],
       [0. , 0. , 3. ],
       [0. , 0. , 0. ]])

>>> model.gamma
array([[0.77395605, 0.43887844, 0.85859792],
       [0.69736803, 0.09417735, 0.97562235]])

>>> model.omegas
array([[0.7611397 , 0.78606431, 0.12811363],
       [0.45038594, 0.37079802, 0.92676499],
       [0.64386512, 0.82276161, 0.4434142 ],
       [0.22723872, 0.55458479, 0.06381726],
       [0.82763117, 0.6316644 , 0.75808774]])

>>> model.psis
array([[0.35452597, 0.97069802],
       [0.89312112, 0.7783835 ],
       [0.19463871, 0.466721  ],
       [0.04380377, 0.15428949],
       [0.68304895, 0.74476216]])

>>> model.p
3

>>> model.e
5

>>> model.l
2

>>> model.I_B
array([[ 1. ,  0. ,  0. ],
       [ 0. ,  1. ,  0. ],
       [-0.5, -3. ,  1. ]])

When A is not a DAG: >>> bad_A = np.array([[0,0,1], [0,0,1], [1,0,0]]) >>> Model(bad_A, B, gamma, omegas, psis) Traceback (most recent call last): … ValueError: A does not correspond to a DAG.

When B does not match the sparsity pattern: >>> Model(A, B.T, gamma, omegas, psis) Traceback (most recent call last): … ValueError: B does not respect sparsity pattern in A.

When the dimensions of the parameters are incompatible:

>>> bad_B = np.array([[0,0.5], [0,3]])
>>> Model(A, bad_B, gamma, omegas, psis)
Traceback (most recent call last):
...
ValueError: A and B have different dimensions.

>>> bad_gamma = rng.uniform(size=(2,4))
>>> Model(A, B, bad_gamma, omegas, psis)
Traceback (most recent call last):
...
ValueError: The sizes of A and gamma are not compatible.

>>> bad_omegas = rng.uniform(size=(5,2))
>>> Model(A, B, gamma, bad_omegas, psis)
Traceback (most recent call last):
...
ValueError: The sizes of A and omegas are not compatible.

>>> bad_psis = rng.uniform(size=(4,2))
>>> Model(A, B, gamma, omegas, bad_psis)
Traceback (most recent call last):
...
ValueError: The sizes of omegas and psis are not compatible.

>>> bad_psis = rng.uniform(size=(5,3))
>>> Model(A, B, gamma, omegas, bad_psis)
Traceback (most recent call last):
...
ValueError: The sizes of gamma and psis are not compatible.

copy()

Returns a copy of the current model.

Returns: copy – A copy of this object. All contained arrays are copied using ndarray.copy().
Return type: Model()

Example

>>> model.psis
array([[0.35452597, 0.97069802],
       [0.89312112, 0.7783835 ],
       [0.19463871, 0.466721  ],
       [0.04380377, 0.15428949],
       [0.68304895, 0.74476216]])
>>> copy = model.copy()
>>> copy.psis
array([[0.35452597, 0.97069802],
       [0.89312112, 0.7783835 ],
       [0.19463871, 0.466721  ],
       [0.04380377, 0.15428949],
       [0.68304895, 0.74476216]])

covariances()

The covariance matrices of the observed variables, as entailed by the model in each environment.

Returns: covariances – The covariance matrices.
Return type: numpy.ndarray

Example

>>> model.covariances() 
array([[[ 1.44557555,  0.18417459,  2.17133182],
        [ 0.18417459,  0.86296055,  2.90375069],
        [ 2.17133182,  2.90375069, 12.22668828]],
...

intervention_strength()

Assumption deviation metric: we described how approximate knowledge of the latent variables is enough for identifiability as long as the interventions on the observed variables are strong. Thus, as a second indicator, we measure the strength of the interventions. See section 3.5 of the paper for more information.

Returns: metric – An array of floats with as many entries as variables (p) in the model, indicating the strength of the interventions over each observed variable.
Return type: numpy.ndarray

Examples

>>> model.intervention_strength()
array([0.0578593 , 0.03265554, 0.12387794])

inv_noise_term_covariances()

Compute the inverse noise term covariance matrix, noted as M, for each environment.

Returns: Ms – A e x p x p array containing the inverse noise-term covariance matrices, one per environment.
Return type: numpy.ndarray

Example

>>> model.inv_noise_term_covariances() 
array([[[ 1.20040982, -0.04683013, -0.81098407],
        [-0.04683013,  1.21369509, -0.1739194 ],
        [-0.81098407, -0.1739194 ,  1.34413286]],
...

noise_term_covariances()

Compute the noise-term covariance matrix for each environment.

Returns: noise_term_covariances – A e x p x p array containing the inverse noise-term covariance matrices, one per environment.
Return type: numpy.ndarray

Example

>>> model.noise_term_covariances()  
array([[[1.44557555, 0.18417459, 0.89602027],
        [0.18417459, 0.86296055, 0.22278173],
        [0.89602027, 0.22278173, 1.31341498]],
...

sample(n_obs, compute_covs=False, random_state=42)

Generate a multi-environment sample from the model.

Parameters

n_obs (int or array-like of ints) – The number of observations to generate from each environment. If a single number is passed, generate this number of observations for all environments.
compute_covs (bool, default=False) – If additionally the sample_covariances for the generated samples should be computed.
random_state (NoneType or int, default=42) – To set the random state for reproducibility. If None, subsequent calls will yield different samples.

Returns

X (list of numpy.ndarray) – A list containing the sample from each environment.
sample_covariances (numpy.ndarray) – A 3-dimensional array containing the estimated sample covariances of the observed variables for each environment. Returned only if compute_covs=True.
n_obs (numpy.nadarray of ints) – The number of observations available from each environment (i.e. the sample size). Returned only if compute_covs=True.

Raises

ValueError : – If the values passed for n_obs are not positive, the length of n_obs does not match the number of environments, or sample_covariances=True but we are sampling a single observation from any of the environments (i.e. covariance matrix cannot be computed).
TypeError : – If n_obs is not a list of integers.

Examples

Generating a random sample:

>>> model.sample(10) 
[array([[-1.30178026, -0.03529043, -0.90999532],
...

Additionally computing the sample covariances:

>>> X, covariances, n_obs = model.sample(10, compute_covs=True)
>>> n_obs
array([10, 10, 10, 10, 10])
>>> covariances 
array([[[ 1.70009515,  0.05345342,  2.35152191],
        [ 0.05345342,  0.21506513,  0.67053129],
        [ 2.35152191,  0.67053129,  5.20810612]],
...

We cannot compute the sample covariances when the sample contains a single observation:

>>> model.sample(1) 
[array([[-1.30178026, -0.03529043, -0.90999532]]),...
>>> model.sample(1, compute_covs=True)
Traceback (most recent call last):
...
ValueError: Cannot compute sample covariances for a single observation.
>>> model.sample([1,2,3,4,5], compute_covs=True)
Traceback (most recent call last):
...
ValueError: Cannot compute sample covariances for a single observation.

Specifying a different number of observations per environment:

>>> model.sample([2,3,4,5,6]) 
[array([[-1.30178026, -0.03529043, -0.90999532],
...

Examples of failure (Value Errors)

>>> model.sample([1,2])
Traceback (most recent call last):
...
ValueError: n_obs has the wrong length.
>>> model.sample([-1,2,3,4,5])
Traceback (most recent call last):
...
ValueError: n_obs should be a positive integer or list of positive integers.
>>> model.sample([0,2,3,4,5])
Traceback (most recent call last):
...
ValueError: n_obs should be a positive integer or list of positive integers.
>>> model.sample(0)
Traceback (most recent call last):
...
ValueError: n_obs should be a positive integer or list of positive integers.

Examples of failure (Type Errors):

>>> model.sample([1.0,2,3,4,5])
Traceback (most recent call last):
...
TypeError: n_obs should be a positive integer or list of positive integers.
>>> model.sample("a")
Traceback (most recent call last):
...
TypeError: n_obs should be a positive integer or list of positive integers.

scaled_latent_incoherence()

Assumption deviation metric: motivated by the incoherence (denseness) assumption of the latent effects, the measure computes the incoherence of the latent effects estimated by the model. See section 3.5 of the paper for more information.

We output the latent incoherence scaled by the max. degree of the moral graph.

Returns: metric – The latent incoherence metric of the model.
Return type: float

Examples

>>> model.scaled_latent_incoherence()
1.9462534859894438

score(sample_covariances, n_obs)

Compute the score of the model for the given sample covariances and number of observations from each environment.

Parameters

sample_covariances (numpy.ndarray) – A 3-dimensional array containing the estimated sample covariances of the observed variables for each environment.
n_obs (list of ints) – The number of observations available from each environment (i.e. the sample size).

Returns

score – The computed score.

Return type

float

Examples

>>> model.score(sample_covariances, n_obs) 
1.1070517672870...