Repository logo

Penalized isotonic regression and an application in survey sampling




Wu, Jiwen, author
Opsomer, Jean D., advisor
Meyer, Mary C., advisor
Breidt, F. Jay, committee member
Doherty, Paul, committee member

Journal Title

Journal ISSN

Volume Title


In isotonic regression, the mean function is assumed to be monotone increasing (or de- creasing) but otherwise unspecified. The classical isotonic least-squares estimator is known to be inconsistent at boundaries; this is called the spiking problem. A penalty on the range of the regression function is proposed to correct the spiking problem for univariate and mul- tivariate isotonic models. The penalized estimator is shown to be consistent everywhere for a wide range of sizes of the penalty parameter. For the univariate case, the optimal penalty is shown to depend on the derivatives of the true regression function at the boundaries. Pointwise confidence intervals are constructed using the penalized estimator and bootstrap- ping ideas; these are shown through simulations to behave well in moderate sized samples. Simulation studies also show that the power of the hypothesis test of constant versus in- creasing regression function improves substantially compared to the power of the test with unpenalized alternative, and also compares favorably to tests using parametric alternatives. The application of isotonic regression is also considered in the survey context where many variables contain natural orderings that should be respected in the estimates. For instance, the National Compensation Survey estimates mean wages for many job categories, and these mean wages are expected to be non-decreasing according to job level. In this type of situation, isotonic regression can be applied to give constrained estimators satisfying the monotonicity. We combine domain estimation and the pooled adjacent violators algorithm to construct new design-weighted constrained estimators. The resulting estimator is the classical design- based domain estimator but after adaptive pooling of neighboring domains, so that it is both readily implemented in large-scale surveys and easy to explain to data users. Under mild conditions on the sampling design and the population, the estimators are shown to be design consistent and asymptotically normal. Confidence intervals for domain means using linearization-based and replication-based variance estimation show marked improvements compared to survey estimators that do not incorporate the constraints. Furthermore, a cone projection algorithm is implemented in the domain mean estimate to accommodate qualitative constraints in the case of two covariates. Theoretical properties of the constrained estimators have been investigated and a simulation study is used to demonstrate the improvement of confidence interval when using the constrained estimate. We also provide a relaxed monotone constraint to loosen the qualitative assumptions, where the extent of departure from monotonicity can be controlled by a weight function and a chosen bandwidth. We compare the unconstrained estimate, constrained estimate without penalty, constrained estimate with penalty, and the relax constrained estimate. Improvements are found in the confidence interval with higher coverage rates and smaller confidence size when incorporating the constraints, and the penalized version fixes the spiking problem at the boundary.


Rights Access



Associated Publications