Jeudi 3 mars 2022
Mehdi Dagdoug (Université de Franche-Comté)
Regression tree and random forest imputation in surveys
Surveys are used to gather data from finite populations and estimate finite population parameters such as population totals, means or quantiles. Often, sampled elements refuse to cooperate and missing data follows. Nonresponse in surveys is usually handled through some form of imputation. Regression trees and random forests provide flexible tools for obtaining a set of imputed values. In this work, we provide a mathematical analysis of tree and random forest imputed estimators. The finite sample properties reveal a stability property pertaining to tree estimators which is lost for small forest estimators and recovered with large forests. The -consistency of these estimators is obtained. We also present the results from a simulation study that investigates the performance of point estimators based on tree and random forest imputation in terms of bias and efficiency.