## Model-assisted estimation through random forests in finite population sampling

par - publié le

Mehdi Dagdoug
(Université de Bourgogne-Franche-Comté)

Estimation of finite population totals is of primary interest in survey sampling. Often, additional auxiliary information is available at the population level. Model-assisted estimators use this additional source of information to construct estimators built upon predictors. In this work, a new class of model-assisted estimators based on random forests is proposed.
Generally speaking, random forest is an ensemble method which consists of creating a large number of regression trees and combining them to produce more accurate predictions than a single regression tree would.
Under mild conditions, the proposed estimators are shown to be asymptotically design unbiased and consistent. Their asymptotic variance is derived, and a consistent variance estimator is suggested. The asymptotic distribution of the estimators is derived allowing for the use of normal-based confidence intervals.
Simulations illustrate that the proposed estimator is very robust and can outperform state-of-the-art estimators, especially in difficult settings (e.g, small sample size and/or high-dimensional setting, ...).