Résumé : Most modern machine learning models require one hyperparameter to be chosen by the user upstream of the learning phase. Popular approaches use a grid of values on which to evaluate the performance of the model for a given criterion, one can think of grid-search or random-search which means fitting the given model for each value of the grid.
These methods have a major drawback : they scale exponentially with the number of hyperparameters. In this presentation, we will show that the hyperparameter selection problem can be cast as a bilevel optimization problem and will consider non-smooth models (such as the Lasso, the Elastic Net, the SVM).
We propose a first-order method that uses information about the gradient with respect to the hyperparameter to automatically select the best hyperparameter for a given criterion.
We will see that this method is very efficient even when the number of hyperparameters gets large.