hidimstat.desparsified_lasso¶

hidimstat.desparsified_lasso(X, y, dof_ajdustement=False, confidence=0.95, max_iter=5000, tol=0.001, residual_method='lasso', alpha_max_fraction=0.01, n_jobs=1, memory=None, verbose=0)¶

Desparsified Lasso with confidence intervals

Parameters

Xndarray, shape (n_samples, n_features): Data.
yndarray, shape (n_samples,): Target.
dof_ajdustementbool, optional (default=False): If True, makes the degrees of freedom adjustement (cf. [4] and [5]). Otherwise, the original Desparsified Lasso estimator is computed (cf. [1] and [2] and [3]).
confidencefloat, optional (default=0.95): Confidence level used to compute the confidence intervals. Each value should be in the range [0, 1].
max_iterint, optional (default=5000): The maximum number of iterations when regressing, by Lasso, each column of the design matrix against the others.
tolfloat, optional (default=1e-3): The tolerance for the optimization of the Lasso problems: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.
residual_methodstr, optional (default=’lasso’): Method used for computing the residuals of the Nodewise Lasso. Currently the only method available is ‘lasso’.
alpha_max_fractionfloat, optional (default=0.01): Only used if method=’lasso’. Then alpha = alpha_max_fraction * alpha_max.
n_jobsint or None, optional (default=1): Number of CPUs to use during the Nodewise Lasso.
memorystr or joblib.Memory object, optional (default=None): Used to cache the output of the computation of the Nodewise Lasso. By default, no caching is done. If a string is given, it is the path to the caching directory.
verbose: int, optional (default=1): The verbosity level: if non zero, progress messages are printed when computing the Nodewise Lasso in parralel. The frequency of the messages increases with the verbosity level.

Returns

beta_hatarray, shape (n_features,): Estimated parameter vector.
cb_minarray, shape (n_features): Lower bound of the confidence intervals on the parameter vector.
cb_maxarray, shape (n_features): Upper bound of the confidence intervals on the parameter vector.

Notes

The columns of X and y are always centered, this ensures that the intercepts of the Nodewise Lasso problems are all equal to zero and the intercept of the noise model is also equal to zero. Since the values of the intercepts are not of interest, the centering avoids the consideration of unecessary additional parameters. Also, you may consider to center and scale X beforehand, notably if the data contained in X has not been prescaled from measurements.

References

1: Zhang, C. H., & Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 217-242.
2: Van de Geer, S., Bühlmann, P., Ritov, Y. A., & Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics, 42(3), 1166-1202.
3: Javanmard, A., & Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. The Journal of Machine Learning Research, 15(1), 2869-2909.
4: Bellec, P. C., & Zhang, C. H. (2019). De-biasing the lasso with degrees-of-freedom adjustment. arXiv preprint arXiv:1902.08885.
5: Celentano, M., Montanari, A., & Wei, Y. (2020). The Lasso with general Gaussian designs with applications to hypothesis testing. arXiv preprint arXiv:2007.13716.