Skip to Main content Skip to Navigation
Theses

Statistical control of sparse models in high dimension

Abstract : In this thesis, we focus on the multivariate inference problem in the context of high-dimensional structured data. More precisely, given a set of explanatory variables (features) and a target, we aim at recovering the features that are predictive conditionally to others, i.e., recovering the support of a linear predictive model. We concentrate on methods that come with statistical guarantees since we want to have a control on the occurrence of false discoveries. This is relevant to inference problems on high-resolution images, where one aims at pixel- or voxel-level analysis, e.g., in neuroimaging, astronomy, but also in other settings where features have a spatial structure, e.g., in genomics. In such settings, existing procedures are not helpful for support recovery since they lack power and are generally not tractable. The problem is then hard both from the statistical modeling point of view, and from a computation perspective. In these settings, feature values typically reflect the underlying spatial structure, which can thus be leveraged for inference. For example, in neuroimaging, a brain image has a 3D representation and a given voxel is highly correlated with its neighbors. We notably propose the ensemble of clustered desparsified Lasso (ecd-Lasso) estimator that combines three steps: i) a spatially constrained clustering procedure that reduces the problem dimension while taking into account data structure, ii) the desparsified Lasso (d-Lasso) statistical inference procedure that is tractable on reduced versions of the original problem, and iii) an ensembling method that aggregates the solutions of different compressed versions of the problem to avoid relying on only one arbitrary data clustering choice. We consider new ways to control the occurrence of false discoveries with a given spatial tolerance. This control is well adapted to spatially structured data. In this work, we focus on neuroimaging datasets but the methods that we present can be adapted to other fields which share similar setups.
Document type :
Theses
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03147200
Contributor : ABES STAR :  Contact
Submitted on : Friday, February 19, 2021 - 4:26:12 PM
Last modification on : Wednesday, June 15, 2022 - 8:39:46 PM
Long-term archiving on: : Thursday, May 20, 2021 - 7:53:38 PM

File

98278_CHEVALIER_2020_archivage...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03147200, version 1

Citation

Jérôme-Alexis Chevalier. Statistical control of sparse models in high dimension. Machine Learning [stat.ML]. Université Paris-Saclay, 2020. English. ⟨NNT : 2020UPASG051⟩. ⟨tel-03147200⟩

Share

Metrics

Record views

154

Files downloads

29