hidimstat.clustered_inference

hidimstat.clustered_inference(X_init, y, ward, n_clusters, train_size=1.0, groups=None, method='desparsified-lasso', seed=0, n_jobs=1, memory=None, verbose=1, **kwargs)

Clustered inference algorithm

Parameters
X_initndarray, shape (n_samples, n_features)

Original data (uncompressed).

yndarray, shape (n_samples,) or (n_samples, n_times)

Target.

wardsklearn.cluster.FeatureAgglomeration

Scikit-learn object that computes Ward hierarchical clustering.

n_clustersint

Number of clusters used for the compression.

train_sizefloat, optional (default=1.0)

Fraction of samples used to compute the clustering. If train_size = 1, clustering is not random since all the samples are used to compute the clustering.

groupsndarray, shape (n_samples,), optional (default=None)

Group labels for every sample. If not None, groups is used to build the subsamples that serve for computing the clustering.

methodstr, optional (default=’desparsified-lasso’)

Method used for making the inference. Currently the two methods available are ‘desparsified-lasso’ and ‘group-desparsified-lasso’. Use ‘desparsified-lasso’ for non-temporal data and ‘group-desparsified-lasso’ for temporal data.

seed: int, optional (default=0)

Seed used for generating a random subsample of the data. This seed controls the clustering randomness.

n_jobsint or None, optional (default=1)

Number of CPUs to use during parallel steps such as inference.

memorystr or joblib.Memory object, optional (default=None)

Used to cache the output of the computation of the clustering and the inference. By default, no caching is done. If a string is given, it is the path to the caching directory.

verbose: int, optional (default=1)

The verbosity level. If verbose > 0, we print a message before runing the clustered inference.

**kwargs:

Arguments passed to the statistical inference function.

Returns
beta_hatndarray, shape (n_features,) or (n_features, n_times)

Estimated parameter vector or matrix.

pvalndarray, shape (n_features,)

p-value, with numerically accurate values for positive effects (ie., for p-value close to zero).

pval_corrndarray, shape (n_features,)

p-value corrected for multiple testing.

one_minus_pvalndarray, shape (n_features,)

One minus the p-value, with numerically accurate values for negative effects (ie., for p-value close to one).

one_minus_pval_corrndarray, shape (n_features,)

One minus the p-value corrected for multiple testing.

References

1

Chevalier, J. A., Nguyen, T. B., Thirion, B., & Salmon, J. (2021). Spatially relaxed inference on high-dimensional linear models. arXiv preprint arXiv:2106.02590.