Repository logo
 

Topics in estimation for messy surveys: imperfect matching and nonprobability sampling

dc.contributor.authorHuang, Chien-Min, author
dc.contributor.authorBreidt, F. Jay, advisor
dc.contributor.authorWang, Haonan, committee member
dc.contributor.authorKeller, Joshua, committee member
dc.contributor.authorPallickara, Sangmi, committee member
dc.date.accessioned2022-08-29T10:17:13Z
dc.date.available2022-08-29T10:17:13Z
dc.date.issued2022
dc.description.abstractTwo problems in estimation for "messy" surveys are addressed, both requiring the combination of survey data with other data sources. The first estimation problem involves the combination of survey data with auxiliary data, when the matching of the two sources is imperfect. Model-assisted survey regression estimators combine auxiliary information available at a population level with complex survey data to estimate finite population parameters. Many prediction methods, including linear and mixed models, nonparametric regression, and machine learning techniques, can be incorporated into such model-assisted estimators. These methods assume that observations obtained for the sample can be matched without error to the auxiliary data. We investigate properties of estimators that rely on matching algorithms that do not in general yield perfect matches. We focus on difference estimators, which are exactly unbiased under perfect matching but not under imperfect matching. The methods are investigated analytically and via simulation, using a study of recreational angling in South Carolina to build a simulation population. In this study, the survey data come from a stratified, two-stage sample and the auxiliary data from logbooks filed by boat captains. Extensions to multiple frame estimators under imperfect matching are discussed. The second estimation problem involves the combination of survey data from a probability sample with additional data from a nonprobability sample. The problem is motivated by an application in which field crews are allowed to use their judgment in selecting part of a sample. Many surveys are conducted in two or more stages, with the first stage of primary sampling units dedicated to screening for secondary sampling units of interest, which are then measured or subsampled. The Large Pelagics Intercept Survey, conducted by the United States National Marine Fisheries Service, draws a probability sample of fishing access site-days in the first stage and screens for relatively rare fishing trips that target pelagic species (tuna, sharks, billfish, etc.). Many site-days yield no pelagic trips. Motivated by this low yield, we consider surveys that allow expert judgment in the selection of some site-days. This nonprobability judgment sample is combined with a probability sample to generate likelihood-based estimates of inclusion probabilities and estimators of population totals that are related to dual-frame estimators. Consistency and asymptotic normality of the estimators are established under the correct specification of the model for judgment behavior. An extensive simulation study shows the robustness of the methodology to misspecification of the judgment behavior. A standard variance estimator, readily available in statistical software, yields stable estimates with small negative bias and good confidence interval coverage. Across a range of conditions, the proposed strategy that allows for some judgment dominates the classic strategy of pure probability sampling with known design weights. The methodology is extended to a doubly-robust version that uses both a propensity model for judgment selection probabilities and a regression model for study variable characteristics. If either model is correctly specified, the doubly-robust estimator is unbiased. The dual-frame methodology for samples incorporating expert judgment is then extended to two other nonprobability settings: respondent-driven sampling and biased-frame sampling.
dc.format.mediumborn digital
dc.format.mediumdoctoral dissertations
dc.identifierHuang_colostate_0053A_17316.pdf
dc.identifier.urihttps://hdl.handle.net/10217/235704
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2020-
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.titleTopics in estimation for messy surveys: imperfect matching and nonprobability sampling
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineStatistics
thesis.degree.grantorColorado State University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (Ph.D.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Huang_colostate_0053A_17316.pdf
Size:
865.79 KB
Format:
Adobe Portable Document Format
Description: