utlvce.generators

The utlvce.generators module contains functions to generate random UT-LVCE models from given or random DAG adjacencies.

utlvce.generators.chain_graph_model(p, I, num_latent, e, var_lo, var_hi, int_var_lo, int_var_hi, psi_lo, psi_hi, int_psi_lo, int_psi_hi, B_lo, B_hi, sparse_latents=False, obs=True, random_state=42, verbose=0)

Generate a random model from a chain graph with p nodes.

Parameters
  • p (int) – The number of observed variables in the model.

  • I (set) – The set of intervention targets.

  • num_latent (int) – The number of latent variables in the model.

  • e (int) – The number of environments.

  • var_lo (float) – The lower bound for the variances of the noise terms of the observed variables.

  • var_hi (float) – The upper bound for the variances of the noise terms of the observed variables.

  • int_var_lo (float) – The lower bound for the intervention variances on the observed variables.

  • int_var_hi (float) – The upper bound for the intervention variances on the observed variables.

  • psi_lo (float) – The lower bound for the variances of the latent variables.

  • psi_hi (float) – The upper bound for the variances of the latent variables.

  • int_psi_lo (float) – The lower bound for the intervention variances on the latent variables.

  • int_psi_hi (float) – The upper bound for the intervention variances on the latent variables.

  • B_lo (float) – The lower bound for the edge weights between observed variables.

  • B_hi (float) – The upper bound for the edge weights between observed variables.

  • sparse_latents (bool, default=False) – If the gamma matrix of latent effects should be sparse (see source).

  • obs (bool, default=True) – Whether the first environment should be “observational”, i.e. that the variances of the noise terms and latents are lower (variable-wise) than the other environments. With obs=True, the variances for first environment are sampled from [var_lo, var_hi] and, from [var_lo + int_var_lo, var_hi + int_var_hi] for the remaining environments; the same holds for the sampling of psi. If obs=False, the latter interval is used for all environments. Note that is not a necessary assumption for the UT-LVCE estimator, but makes the actual intervention strength less sensitive to the random sampling of parameters.

  • random_state (int, default=42) – To set the random state for reproducibility. Successive calls with the same random state will return the same model.

  • verbose (int, default = 0) – If debug and execution traces should be printed. 0 corresponds to no traces, higher values correspond to higher verbosity.

Returns

model – An instance of the model with the sampled parameters.

Return type

utlvce.model.Model

Raises

ValueError : – If the intervention targets are not a subset of the variable indices, i.e. [0,…,p-1].

Examples

>>> chain_graph_model(20,{2},2,5,0.5,0.6,3,6,0.2,0.4,1,5,0.7,0.8,False,True,42,0) 
<utlvce.model.Model object at 0x...>
utlvce.generators.intervention_targets(p, num_targets, random_state=42)

Sample a set of intervention targets.

Parameters
  • p (int) – The number of variables, i.e. targets will be sampled from [0,p-1].

  • num_targets (int or tuple) – Specifies the number of targets. If a two-element tuple, the number of targets is sampled uniformly at random from [size[0], size[1]]

  • random_state (int) – To set the random state for reproducibility.

Returns

targets – A set with the indices of the intervention targets.

Return type

set

Raises

ValueError : – If the given number of targets is invalid.

Examples

>>> intervention_targets(20, 3)
{1, 13, 14}
>>> intervention_targets(20, (1,10), random_state=1)
{0, 2, 8, 12, 17}
>>> intervention_targets(20, (1,10), random_state=2)
{1, 3, 4, 6, 8, 13, 18, 19}
>>> intervention_targets(10, 10)
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> intervention_targets(10, 0)
set()

Requesting an inappropriate (>p) number of targets yields a ValueError:

>>> intervention_targets(10, 11)
Traceback (most recent call last):
...
ValueError: Invalid number of targets.
>>> intervention_targets(10, (0,11))
Traceback (most recent call last):
...
ValueError: Invalid number of targets.
utlvce.generators.random_graph_model(p, k, I, num_latent, e, var_lo, var_hi, int_var_lo, int_var_hi, psi_lo, psi_hi, int_psi_lo, int_psi_hi, B_lo, B_hi, sparse_latents=False, obs=True, random_state=42, verbose=0)

Generate a random model from a random Erdős–Rényi graph with p nodes and average degree k.

Parameters
  • p (int) – The number of observed variables in the model.

  • k (float) – The average degree of the underlying Erdős–Rényi graph.

  • I (set) – The set of intervention targets.

  • num_latent (int) – The number of latent variables in the model.

  • e (int) – The number of environments.

  • var_lo (float) – The lower bound for the variances of the noise terms of the observed variables.

  • var_hi (float) – The upper bound for the variances of the noise terms of the observed variables.

  • int_var_lo (float) – The lower bound for the intervention variances on the observed variables.

  • int_var_hi (float) – The upper bound for the intervention variances on the observed variables.

  • psi_lo (float) – The lower bound for the variances of the latent variables.

  • psi_hi (float) – The upper bound for the variances of the latent variables.

  • int_psi_lo (float) – The lower bound for the intervention variances on the latent variables.

  • int_psi_hi (float) – The upper bound for the intervention variances on the latent variables.

  • B_lo (float) – The lower bound for the edge weights between observed variables.

  • B_hi (float) – The upper bound for the edge weights between observed variables.

  • sparse_latents (bool, default=False) – If the gamma matrix of latent effects should be sparse (see source).

  • obs (bool, default=True) – Whether the first environment should be “observational”, i.e. that the variances of the noise terms and latents are lower (variable-wise) than the other environments. With obs=True, the variances for first environment are sampled from [var_lo, var_hi] and, from [var_lo + int_var_lo, var_hi + int_var_hi] for the remaining environments; the same holds for the sampling of psi. If obs=False, the latter interval is used for all environments. Note that is not a necessary assumption for the UT-LVCE estimator, but makes the actual intervention strength less sensitive to the random sampling of parameters.

  • random_state (int, default=42) – To set the random state for reproducibility. Successive calls with the same random state will return the same model.

  • verbose (int, default = 0) – If debug and execution traces should be printed. 0 corresponds to no traces, higher values correspond to higher verbosity.

Returns

model – An instance of the model with the sampled parameters.

Return type

utlvce.model.Model

Raises

ValueError : – If the intervention targets are not a subset of the variable indices, i.e. [0,…,p-1].

Examples

>>> random_graph_model(20,2.1,{2},2,5,0.5,0.6,3,6,0.2,0.4,1,5,0.7,0.8,False,True,42,0) 
<utlvce.model.Model object at 0x...>
utlvce.generators.sample_parameters(A, I, num_latent, e, var_lo, var_hi, int_var_lo, int_var_hi, psi_lo, psi_hi, int_psi_lo, int_psi_hi, B_lo, B_hi, sparse_latents=False, obs=True, random_state=42, verbose=0)

Generate a random model given an adjacency matrix A and intervention targets I.

Parameters
  • A (numpy.ndarray) – The adjacency matrix of the DAG underlying the model, where A[i,j] != 0 implies i -> j.

  • I (set) – The set of intervention targets.

  • num_latent (int) – The number of latent variables in the model.

  • e (int) – The number of environments.

  • var_lo (float) – The lower bound for the variances of the noise terms of the observed variables.

  • var_hi (float) – The upper bound for the variances of the noise terms of the observed variables.

  • int_var_lo (float) – The lower bound for the intervention variances on the observed variables.

  • int_var_hi (float) – The upper bound for the intervention variances on the observed variables.

  • psi_lo (float) – The lower bound for the variances of the latent variables.

  • psi_hi (float) – The upper bound for the variances of the latent variables.

  • int_psi_lo (float) – The lower bound for the intervention variances on the latent variables.

  • int_psi_hi (float) – The upper bound for the intervention variances on the latent variables.

  • B_lo (float) – The lower bound for the edge weights between observed variables.

  • B_hi (float) – The upper bound for the edge weights between observed variables.

  • sparse_latents (bool, default=False) – If the gamma matrix of latent effects should be sparse (see source).

  • obs (bool, default=True) – Whether the first environment should be “observational”, i.e. that the variances of the noise terms and latents are lower (variable-wise) than the other environments. With obs=True, the variances for first environment are sampled from [var_lo, var_hi] and, from [var_lo + int_var_lo, var_hi + int_var_hi] for the remaining environments; the same holds for the sampling of psi. If obs=False, the latter interval is used for all environments. Note that is not a necessary assumption for the UT-LVCE estimator, but makes the actual intervention strength less sensitive to the random sampling of parameters.

  • random_state (int, default=42) – To set the random state for reproducibility. Successive calls with the same random state will return the same model.

  • verbose (int, default = 0) – If debug and execution traces should be printed. 0 corresponds to no traces, higher values correspond to higher verbosity.

Returns

model – An instance of the model with the sampled parameters.

Return type

utlvce.model.Model

Raises

ValueError : – If the given adjacency is not a DAG or the intervention targets are not a subset of the variable indices, i.e. [0,…,p-1].

Examples

>>> A = np.array([[0, 0, 1], [0, 0, 1], [0, 0, 0]])
>>> sample_parameters(A,{2},2,5,0.5,0.6,3,6,0.2,0.4,1,5,0.7,0.8,False,True,42,0) 
<utlvce.model.Model object at 0x...>

Requesting an inappropriate (>p) number of targets yields a ValueError:

>>> sample_parameters(A,{3},2,5,0.5,0.6,3,6,0.2,0.4,1,5,0.7,0.8,False,True,42,0)
Traceback (most recent call last):
...
ValueError: The intervention targets must be a subset of [0,...,p-1].

A ValueError is raised if the given adjacency does not correspond to a DAG (e.g. it contains cycles):

>>> A = np.array([[0, 0, 1], [0, 0, 1], [1, 0, 0]])
>>> sample_parameters(A,{2},2,5,0.5,0.6,3,6,0.2,0.4,1,5,0.7,0.8,False,True,42,0)
Traceback (most recent call last):
...
ValueError: The given adjacency does not correspond to a DAG.