Survey sampling with nonparametric methods: endogenous post-stratification and penalized instrumental variables
Date
2012
Authors
Dahlke, Mark, author
Breidt, F. Jay, advisor
Opsomer, Jean, committee member
Lee, Myung-Hee, committee member
Pezeshki, Ali, committee member
Journal Title
Journal ISSN
Volume Title
Abstract
Two topics related to the common theme of nonparametric techniques in survey sampling are examined. The first topic explores the estimation of a finite population mean via post-stratification. Post-stratification is used to improve the precision of survey estimators when categorical auxiliary information is available from external sources. In natural resource surveys, such information may be obtained from remote sensing data classified into categories and displayed as maps. These maps may be based on classification models fitted to the sample data. Such "endogenous post-stratification" violates the standard assumptions that observations are classified without error into post-strata, and post-stratum population counts are known. Properties of the endogenous post-stratification estimator (EPSE) are derived for the case of sample-fitted nonparametric models, with particular emphasis on monotone regression models. Asymptotic properties of the nonparametric EPSE are investigated under a superpopulation model framework. Simulation experiments illustrate the practical effects of first fitting a nonparametric model to survey data before post-stratifying. The second topic explores the use of instrumental variables to estimate regression coefficients. Informative sampling in survey problems occurs when the inclusion probabilities depend on the values of the study variable. In a regression setting under this sampling scheme, ordinary least squares estimators are biased and inconsistent. Given inverse inclusion probabilities as weights for the sample, various consistent estimators can be constructed. In particular, weighted covariates can be used as instrumental variables, allowing for calculation of a consistent, classical two-stage least squares estimator. The proposed estimator uses a similar two-stage process, but with penalized splines at the first stage. Consistency and asymptotic normality of the new estimator are established. The estimator is asymptotically unbiased, but has a finite-sample bias that is analytically characterized. Selection of an optimal smoothing parameter is shown to reduce the finite-sample variance, in comparison to that of the classical two-stage least squares estimator, offsetting the bias and providing an estimator with a reduced mean square error.