Events

DMS Statistics and Data Science Seminar

Time: Sep 08, 2022 (02:00 PM)
Location: ZOOM

Details:

jansen

Speaker: Maarten Jansen (Université Libre de Bruxelles)

Title: The use of information criteria in high dimensional graph and tree model selection

Abstract: The selection of sparse subset in a high-dimensional set of parameters is often formulated as a regularized maximum likelihood problem, where a complexity norm has the role of the regularizing constraint. In the lasso, the complexity norm is the absolute sum of selected and estimated values, which is well known to have a shrinkage effect on the estimation. Shrinkage reduces the effect of false positive selections, which are a major issue in high-dimensional models with only a few true nonzero parameters. As a result, shrinkage makes lasso too tolerant of the presence of false positives. Therefore, the lasso regularization (or smoothing) parameter is typically chosen so that false positives are controlled at a given rate or even (asymptotically) eliminated. As an alternative, we propose new variants of information criteria (such as AIC or Mallows's Cp) for fine-tuning the selection in absence of shrinkage, balancing false positives and false negatives, bias and variance, closeness and complexity. The link with the degrees of freedom is explained. The approach is applied to graphical models for multivariate normal random variables and to trees (such as CART). The latter is illustrated on a problem of change point detection in Poisson intensities using adaptive unbalanced Haar wavelet transforms.