# Theses and Dissertations

## Permanent URI for this collection

## Browse

### Browsing Theses and Dissertations by Issue Date

Filter results by year or month (Choose year) 2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2005 2000 1995 1990 1980 (Choose month) January February March April May June July August September October November December Browse

Now showing 1 - 20 of 70

###### Results Per Page

###### Sort Options

Item Open Access Estimation and linear prediction for regression, autoregression and ARMA with infinite variance data(Colorado State University. Libraries, 1983) Cline, Daren B. H., author; Resnick, Sidney I., advisor; Brockwell, Peter J., advisor; Locker, John, committee member; Davis, Richard A., committee member; Boes, Duane C., committee memberShow more This dissertation is divided into four parts, each of which considers random variables from distributions with regularly varying tails and/or in a stable domain of attraction. Part I considers the existence of infinite series of an independent sequence of such random variables and the relationship of the probability of large values of the series to the probability of large values of the first component. Part II applies Part I in order to provide a linear predictor for ARMA time series (again with regularly varying tails). This predictor is designed to minimize the probability of large prediction errors relative to the tails of the noise distribution. Part III investigates the products of independent random variables where one has distribution in a stable domain of attraction and gives conditions for which the product distribution is in a stable domain of attraction. Part IV considers estimation of the regression parameter in a model where the independent variables are in a stable domain of attraction. Consistency for certain M-estimators is proved. Utilizing portions of Part III this final part gives necessary and sufficient conditions for consistency of least squares estimators and provides the asymptotic distribution of least squares estimators.Show more Item Open Access The pooling of prior distributions via logarithmic and supra-Bayesian methods with application to Bayesian inference in deterministic simulation models(Colorado State University. Libraries, 1998) Roback, Paul J., author; Givens, Geof, advisor; Hoeting, Jennifer, committee member; Howe, Adele, committee member; Tweedie, Richard, committee memberShow more We consider Bayesian inference when priors and likelihoods are both available for inputs and outputs of a deterministic simulation model. Deterministic simulation models are used frequently by scientists to describe natural systems, and the Bayesian framework provides a natural vehicle for incorporating uncertainty in a deterministic model. The problem of making inference about parameters in deterministic simulation models is fundamentally related to the issue of aggregating (i. e. pooling) expert opinion. Alternative strategies for aggregation are surveyed and four approaches are discussed in detail- logarithmic pooling, linear pooling, French-Lindley supra-Bayesian pooling, and Lindley-Winkler supra-Bayesian pooling. The four pooling approaches are compared with respect to three suitability factors-theoretical properties, performance in examples, and the selection and sensitivity of hyperparameters or weightings incorporated in each method and the logarithmic pool is found to be the most appropriate pooling approach when combining exp rt opinions in the context of deterministic simulation models. We develop an adaptive algorithm for estimating log pooled priors for parameters in deterministic simulation models. Our adaptive estimation approach relies on importance sampling methods, density estimation techniques for which we numerically approximate the Jacobian, and nearest neighbor approximations in cases in which the model is noninvertible. This adaptive approach is compared to a nonadaptive approach over several examples ranging from a relatively simple R1 â†’ R1 example with normally distributed priors and a linear deterministic model, to a relatively complex R2 â†’ R2 example based on the bowhead whale population model. In each case, our adaptive approach leads to better and more efficient estimates of the log pooled prior than the nonadaptive estimation algorithm. Finally, we extend our inferential ideas to a higher-dimensional, realistic model for AIDS transmission. Several unique contributions to the statistical discipline are contained in this dissertation, including: 1. the application of logarithmic pooling to inference in deterministic simulation models; 2. the algorithm for estimating log pooled priors using an adaptive strategy; 3. the Jacobian-based approach to density estimation in this context, especially in higher dimensions; 4. the extension of the French-Lindley supra-Bayesian methodology to continuous parameters; 5. the extension of the Lindley-Winkler supra-Bayesian methodology to multivariate parameters; and, 6. the proofs and illustrations of the failure of Relative Propensity Consistency under the French-Lindley supra-Bayesian approach.Show more Item Open Access Nonparametric function smoothing: fiducial inference of free knot splines and ecological applications(Colorado State University. Libraries, 2010) Sonderegger, Derek Lee, author; Wang, Haonan, advisor; Hannig, Jan, advisor; Noon, Barry R. (Barry Richard), 1949-, committee member; Iyer, Hariharan K., committee memberShow more Nonparametric function estimation has proven to be a useful tool for applied statisticians. Classic techniques such as locally weighted regression and smoothing splines are being used in a variety of circumstances to address questions at the forefront of ecological theory. We first examine an ecological threshold problem and define a threshold as where the derivative of the estimated functions changes states (negative, possibly zero, or positive) and present a graphical method that examines the state changes across a wide interval of smoothing levels. We apply this method to macro-invertabrate data from the Arkansas River. Next we investigate a measurement error model and a generalization of the commonly used regression calibration method whereby a nonparametric function is used instead of a linear function. We present a simulation study to assess the effectiveness of the method and apply the method to a water quality monitoring data set. The possibility of defining thresholds as knot point locations in smoothing splines led to the investigation of the fiducial distribution of free-knot splines. After introducing the theory behind fiducial inference, we then derive conditions sufficient to for asymptotic normality of the multivariate fiducial density. We then derive the fiducial density for an arbitrary degree spline with an arbitrary number of knot points. We then show that free-knot splines of degree 3 or greater satisfy the asymptotic normality conditions. Finally we conduct a simulation study to assess quality of the fiducial solution compared to three other commonly used methods.Show more Item Open Access Saddlepoint approximation to functional equations in queueing theory and insurance mathematics(Colorado State University. Libraries, 2010) Chung, Sunghoon, author; Butler, Ronald W., advisor; Scharf, Louis L., committee member; Chapman, Phillip L., committee member; Hoeting, Jennifer A. (Jennifer Ann), 1966-, committee memberShow more We study the application of saddlepoint approximations to statistical inference when the moment generating function (MGF) of the distribution of interest is an explicit or an implicit function of the MGF of another random variable which is assumed to be observed. In other words, let W (s) be the MGF of the random variable W of interest. We study the case when W (s) = h{G (s) ; Î»}, where G (s) is an MGF of G for which a random sample can be obtained, and h is a smooth function. If Äœ (s) estimates G (s), then Å´ (s) = h{Äœ (s) ; Î»Ì‚} estimates W (s). Generally, it can be shown that Å´ (s) converges to W (s) by the strong law of large numbers, which implies that FÌ‚ (t), the cumulative distribution function (CDF) corresponding to Å´ (s), converges to F (t), the CDF of W, almost surely. If we set Å´* (s) = h{Äœ* (s) ; Î»Ì‚}, where Äœ* (s) and Î»Ì‚* are the empirical MGF and the estimator of Î» from bootstrapping, the corresponding CDF FÌ‚* (t) can be used to construct the confidence band of F(t). In this dissertation, we show that the saddlepoint inversion of Å´ (s) is not only fast, reliable, stable, and accurate enough for a general statistical inference, but also easy to use without deep knowledge of the probability theory regarding the stochastic process of interest. For the first part, we consider nonparametric estimation of the density and the CDF of the stationary waiting times W and Wq of an M/G/1 queue. These estimates are computed using saddlepoint inversion of Å´ (s) determined from the Pollaczek-Khinchin formula. Our saddlepoint estimation is compared with estimators based on other approximations, including the CramÃ©r-Lundberg approximation. For the second part, we consider the saddlepoint approximation for the busy period distribution FB (t) in a M/G/1 queue. The busy period B is the first passage time for the queueing system to pass from an initial arrival (1 in the system) to 0 in the system. If B (s) is the MGF of B, then B (s) is an implicitly defined function of G (s) and Î», the inter-arrival rate, through the well-known Kendall-TakÃ¡cs functional equation. As in the first part, we show that the saddlepoint approximation can be used to obtain FÌ‚B (t), the CDF corresponding to BÌ‚(s) and simulation results show that confidence bands of FB (t) based on bootstrapping perform well.Show more Item Open Access A fiducial approach to extremes and multiple comparisons(Colorado State University. Libraries, 2010) Wandler, Damian V., author; Hannig, Jan, advisor; Iyer, Hariharan K., advisor; Chong, Edwin Kah Pin, committee member; Wang, Haonan, committee memberShow more Generalized fiducial inference is a powerful tool for many difficult problems. Based on an extension of R. A. Fisher's work, we used generalized fiducial inference for two extreme value problems and a multiple comparison procedure. The first extreme value problem is dealing with the generalized Pareto distribution. The generalized Pareto distribution is relevant to many situations when modeling extremes of random variables. We use a fiducial framework to perform inference on the parameters and the extreme quantiles of the generalized Pareto. This inference technique is demonstrated in both cases when the threshold is a known and unknown parameter. Simulation results suggest good empirical properties and compared favorably to similar Bayesian and frequentist methods. The second extreme value problem pertains to the largest mean of a multivariate normal distribution. Difficulties arise when two or more of the means are simultaneously the largest mean. Our solution uses a generalized fiducial distribution and allows for equal largest means to alleviate the overestimation that commonly occurs. Theoretical calculations, simulation results, and application suggest our solution possesses promising asymptotic and empirical properties. Our solution to the largest mean problem arose from our ability to identify the correct largest mean(s). This essentially became a model selection problem. As a result, we applied a similar model selection approach to the multiple comparison problem. We allowed for all possible groupings (of equality) of the means of k independent normal distributions. Our resulting fiducial probability for the groupings of the means demonstrates the effectiveness of our method by selecting the correct grouping at a high rate.Show more Item Open Access Improved estimation for complex surveys using modern regression techniques(Colorado State University. Libraries, 2011) McConville, Kelly, author; Breidt, F. Jay, advisor; Lee, Thomas, C. M., advisor; Opsomer, Jean, committee member; Lee, Myung-Hee, committee member; Doherty, Paul F., committee memberShow more In the field of survey statistics, finite population quantities are often estimated based on complex survey data. In this thesis, estimation of the finite population total of a study variable is considered. The study variable is available for the sample and is supplemented by auxiliary information, which is available for every element in the finite population. Following a model-assisted framework, estimators are constructed that exploit the relationship which may exist between the study variable and ancillary data. These estimators have good design properties regardless of model accuracy. Nonparametric survey regression estimation is applicable in natural resource surveys where the relationship between the auxiliary information and study variable is complex and of an unknown form. Breidt, Claeskens, and Opsomer (2005) proposed a penalized spline survey regression estimator and studied its properties when the number of knots is fixed. To build on their work, the asymptotic properties of the penalized spline regression estimator are considered when the number of knots goes to infinity and the locations of the knots are allowed to change. The estimator is shown to be design consistent and asymptotically design unbiased. In the course of the proof, a result is established on the uniform convergence in probability of the survey-weighted quantile estimators. This result is obtained by deriving a survey-weighted Hoeffding inequality for bounded random variables. A variance estimator is proposed and shown to be design consistent for the asymptotic mean squared error. Simulation results demonstrate the usefulness of the asymptotic approximations. Also in natural resource surveys, a substantial amount of auxiliary information, typically derived from remotely-sensed imagery and organized in the form of spatial layers in a geographic information system (GIS), is available. Some of this ancillary data may be extraneous and a sparse model would be appropriate. Model selection methods are therefore warranted. The 'least absolute shrinkage and selection operator' (lasso), presented by Tibshirani (1996), conducts model selection and parameter estimation simultaneously by penalizing the sum of the absolute values of the model coefficients. A survey-weighted lasso criterion, which accounts for the sampling design, is derived and a survey-weighted lasso estimator is presented. The root-n design consistency of the estimator and a central limit theorem result are proved. Several variants of the survey-weighted lasso estimator are constructed. In particular, a calibration estimator and a ridge regression approximation estimator are constructed to produce lasso weights that can be applied to several study variables. Simulation studies show the lasso estimators are more efficient than the regression estimator when the true model is sparse. The lasso estimators are used to estimate the proportion of tree canopy cover for a region of Utah. Under a joint design-model framework, the survey-weighted lasso coefficients are shown to be root-N consistent for the parameters of the superpopulation model and a central limit theorem result is found. The methodology is applied to estimate the risk factors for the Zika virus from an epidemiological survey on the island of Yap. A logistic survey-weighted lasso regression model is fit to the data and important covariates are identified.Show more Item Open Access Bayesian shape-restricted regression splines(Colorado State University. Libraries, 2011) Hackstadt, Amber J., author; Hoeting, Jennifer, advisor; Meyer, Mary, advisor; Opsomer, Jean, committee member; Huyvaert, Kate, committee memberShow more Semi-parametric and non-parametric function estimation are useful tools to model the relationship between design variables and response variables as well as to make predictions without requiring the assumption of a parametric form for the regression function. Additionally, Bayesian methods have become increasingly popular in statistical analysis since they provide a flexible framework for the construction of complex models and produce a joint posterior distribution for the coefficients that allows for inference through various sampling methods. We use non-parametric function estimation and a Bayesian framework to estimate regression functions with shape restrictions. Shape-restricted functions include functions that are monotonically increasing, monotonically decreasing, convex, concave, and combinations of these restrictions such as increasing and convex. Shape restrictions allow researchers to incorporate knowledge about the relationship between variables into the estimation process. We propose Bayesian semi-parametric models for regression analysis under shape restrictions that use a linear combination of shape-restricted regression splines such as I-splines or C-splines. We find function estimates using Markov chain Monte Carlo (MCMC) algorithms. The Bayesian framework along with MCMC allows us to perform model selection and produce uncertainty estimates much more easily than in the frequentist paradigm. Indeed, some of the work proposed in this dissertation has not been developed in parallel in the frequentist paradigm. We begin by proposing a semi-parametric generalized linear model for regression analysis under shape-restrictions. We provide Bayesian shape-restricted regression spline (Bayes SRRS) models and MCMC estimation algorithms for the normal errors, Bernoulli, and Poisson models. We propose several types of inference that can be performed for the normal errors model as well as examine the asymptotic behavior of the estimates for the normal errors model under the monotone shape-restriction. We also examine the small sample behavior of the proposed Bayes SRRS model estimates via simulation studies. We then extend the semi-parametric Bayesian shape-restricted regression splines to generalized linear mixed models. We provide a MCMC algorithm to estimate functions for the random intercept model with normal errors under the monotone shape restriction. We then further extend the semi-parametric Bayesian shape-restricted regression splines to allow the number and location of the knot points for the regression splines to be random and propose a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm for regression function estimation under the monotone shape restriction. Lastly, we propose a Bayesian shape-restricted regression spline change-point model where the regression function is shape-restricted except at the change-points. We provide RJMCMC algorithms to estimate functions with change-points where the number and location of interior knot points for the regression splines are random. We provide a RJMCMC algorithm to estimate the location of an unknown change-point as well as a RJMCMC algorithm to decide between a model with no change-points and model with a change-point.Show more Item Open Access Habitat estimation through synthesis of species presence/absence information and environmental covariate data(Colorado State University. Libraries, 2011) Dornan, Grant J., author; Givens, Geof H., advisor; Hoeting, Jennifer A., committee member; Chapman, Phillip L., committee member; Myrick, Christopher A., committee memberShow more This paper investigates the statistical model developed by Foster, et al. (2011) to estimate marine habitat maps based on environmental covariate data and species presence/absence information while treating habitat definition probabilistically. The model assumes that two sites belonging to the same habitat have approximately the same species presence probabilities, and thus both environmental data and species presence observations can help to distinguish habitats at locations across a study region. I develop a computational method to estimate the model parameters by maximum likelihood using a blocked non-linear Gauss-Seidel algorithm. The main part of my work is developing and conducting simulation studies to evaluate estimation performance and to study related questions including the impacts of sample size, model bias and model misspecification. Seven testing scenarios are developed including between 3 and 9 habitats, 15 and 40 species, and 150 and 400 sampling sites. Estimation performance is primarily evaluated through fitted habitat maps and is shown to be excellent for the seven example scenarios examined. Rates of successful habitat classification ranged from 0.92 to 0.98. I show that there is a roughly balanced tradeoff between increasing the number of sites and increasing the number of species for improving estimation performance. Standard model selection techniques are shown to work for selection of covariates, but selection of the number of habitats benefits from supplementing quantitative techniques with qualitative expert judgement. Although estimation of habitat boundaries is extremely good, the rate of probabilistic transition between habitats is shown to be difficult to estimate accurately. Future research should address this issue. An appendix to this thesis includes a comprehensive and annotated collection of R code developed during this project.Show more Item Open Access Topics in design-based and Bayesian inference for surveys(Colorado State University. Libraries, 2012) Hernandez-Stumpfhauser, Daniel, author; Opsomer, Jean, advisor; Breidt, F. Jay, committee member; Hoeting, Jennifer A., committee member; Kreidenweis, Sonia M., committee memberShow more We deal with two different topics in Statistics. The first topic in survey sampling deals with variance and variance estimation of estimators of model parameters in the design-based approach to analytical inference for survey data when sampling weights include post-sampling weight adjustments such as calibration. Under the design-based approach estimators of model parameters, if available in closed form, are written as functions of estimators of population totals and means. We examine properties of these estimators in particular their asymptotic variances and show how ignoring the post-sampling weight adjustments, i.e. treating sampling weights as inverses of inclusion probabilities, results in biased variance estimators. Two simple simulation studies for two common estimators, an estimator of a population ratio and an estimator of regression coefficients, are provided with the purpose of showing situations for which ignoring the post-sampling weight adjustments results in significant biased variance estimators. For the second topic we consider Bayesian inference for directional data using the projected normal distribution. We show how the models can be estimated using Markov chain Monte Carlo methods after the introduction of suitable latent variables. The cases of random sample, regression, model comparison and Dirichlet process mixture models are covered and motivated by a very large dataset of daily departures of anglers. The number of parameters increases with sample size and thus the need of exploring alternatives. We explore mean field variational methods and identify a number of problems in the application of the method to these models, caused by the poor approximation of the variational distribution to the posterior distribution. We propose solutions to those problems by improving the mean field variational approximation through the use of the Laplace approximation for the regression case and through the use of novel Monte Carlo procedures for the mixture model case.Show more Item Open Access Using community detection on networks to identify migratory bird flyways in North America(Colorado State University. Libraries, 2012) Buhnerkempe, Michael G., author; Hoeting, Jennifer A., advisor; Givens, Geof H., committee member; Webb, Colleen T., committee memberShow more Migratory behavior of waterfowl populations in North America has traditionally been broadly characterized by four north-south flyways, and these flyways have been central to the management of waterfowl populations for more than 80 years. However, recent desires to incorporate uncertainty regarding biological processes into an adaptive harvest management program have underscored the need to re-evaluate the traditional flyway concept and bring uncertainty in flyways themselves into management planning. Here, we use bird band and recovery data to develop a network model of migratory movement for four waterfowl species, mallard (Anas platyrhnchos), northern pintail (A. acuta), American green-winged teal (A. carolinensis), and Canada Goose (Branta Canadensis) in North America. A community detection algorithm is then used to identify migratory flyways. Additionally, we compare flyway structure both across species and through time to determine broad applicability of the previous flyway concept. We also propose a novel metric, the consolidation factor, to describe a node's (i.e., small geographic area) importance in determining flyway structure. The community detection algorithm identified four main flyways for mallards, northern pintails, and American green-winged teal with the flyway structure of Canada geese exhibiting higher complexity. For mallards, flyway structure was relatively consistent through time. However, consolidation factors and cross-community mixing patterns revealed that for mallards and green-winged teal the presumptive Mississippi flyway was potentially a zone of high mixing between flyways. Additionally, interspersed throughout these major flyways were smaller mixing zones that point to added complexity and uncertainty in the four-flyway concept. Not only does the incorporation of this uncertainty due to mixing provide a potential alternative management strategy, but the network approach provides a robust, quantitative approach to flyway identification that fits well with the adaptive harvest management framework currently used in North American waterfowl management.Show more Item Open Access Survey sampling with nonparametric methods: endogenous post-stratification and penalized instrumental variables(Colorado State University. Libraries, 2012) Dahlke, Mark, author; Breidt, F. Jay, advisor; Opsomer, Jean, committee member; Lee, Myung-Hee, committee member; Pezeshki, Ali, committee memberShow more Two topics related to the common theme of nonparametric techniques in survey sampling are examined. The first topic explores the estimation of a finite population mean via post-stratification. Post-stratification is used to improve the precision of survey estimators when categorical auxiliary information is available from external sources. In natural resource surveys, such information may be obtained from remote sensing data classified into categories and displayed as maps. These maps may be based on classification models fitted to the sample data. Such "endogenous post-stratification" violates the standard assumptions that observations are classified without error into post-strata, and post-stratum population counts are known. Properties of the endogenous post-stratification estimator (EPSE) are derived for the case of sample-fitted nonparametric models, with particular emphasis on monotone regression models. Asymptotic properties of the nonparametric EPSE are investigated under a superpopulation model framework. Simulation experiments illustrate the practical effects of first fitting a nonparametric model to survey data before post-stratifying. The second topic explores the use of instrumental variables to estimate regression coefficients. Informative sampling in survey problems occurs when the inclusion probabilities depend on the values of the study variable. In a regression setting under this sampling scheme, ordinary least squares estimators are biased and inconsistent. Given inverse inclusion probabilities as weights for the sample, various consistent estimators can be constructed. In particular, weighted covariates can be used as instrumental variables, allowing for calculation of a consistent, classical two-stage least squares estimator. The proposed estimator uses a similar two-stage process, but with penalized splines at the first stage. Consistency and asymptotic normality of the new estimator are established. The estimator is asymptotically unbiased, but has a finite-sample bias that is analytically characterized. Selection of an optimal smoothing parameter is shown to reduce the finite-sample variance, in comparison to that of the classical two-stage least squares estimator, offsetting the bias and providing an estimator with a reduced mean square error.Show more Item Open Access Parametric and semiparametric model estimation and selection in geostatistics(Colorado State University. Libraries, 2012) Chu, Tingjin, author; Wang, Haonan, advisor; Zhu, Jun, advisor; Meyer, Mary, committee member; Luo, J. Rockey, committee memberShow more This dissertation is focused on geostatistical models, which are useful in many scientific disciplines, such as climatology, ecology and environmental monitoring. In the first part, we consider variable selection in spatial linear models with Gaussian process errors. Penalized maximum likelihood estimation (PMLE) that enables simultaneous variable selection and parameter estimation is developed and for ease of computation, PMLE is approximated by one-step sparse estimation (OSE). To further improve computational efficiency particularly with large sample sizes, we propose penalized maximum covariance-tapered likelihood estimation (PMLET) and its one-step sparse estimation (OSET). General forms of penalty functions with an emphasis on smoothly clipped absolute deviation are used for penalized maximum likelihood. Theoretical properties of PMLE and OSE, as well as their approximations PMLET and OSET using covariance tapering are derived, including consistency, sparsity, asymptotic normality, and the oracle properties. For covariance tapering, a by-product of our theoretical results is consistency and asymptotic normality of maximum covariance-tapered likelihood estimates. Finite-sample properties of the proposed methods are demonstrated in a simulation study and for illustration, the methods are applied to analyze two real data sets. In the second part, we develop a new semiparametric approach to geostatistical modeling and inference. In particular, we consider a geostatistical model with additive components, where the covariance function of the spatial random error is not pre-specified and thus flexible. A novel, local Karhunen-LoÃ¨ve expansion is developed and a likelihood-based method devised for estimating the model parameters. In addition, statistical inference, including spatial interpolation and variable selection, is considered. Our proposed computational algorithm utilizes Newton-Raphson on a Stiefel manifold and is computationally efficient. A simulation study demonstrates sound finite-sample properties and a real data example is given to illustrate our method. While the numerical results are comparable to maximum likelihood estimation under the true model, our method is shown to be more robust against model misspecification and is computationally far more efficient for larger sample sizes. Finally, the theoretical properties of the estimates are explored and in particular, a consistency result is established.Show more Item Open Access Statistical models for animal movement and landscape connectivity(Colorado State University. Libraries, 2013) Hanks, Ephraim M., author; Hooten, Mevin B., advisor; Hoeting, Jennifer, committee member; Wang, Haonan, committee member; Alldredge, Mat, committee member; Theobald, David, committee memberShow more This dissertation considers statistical approaches to the study of animal movement behavior and landscape connectivity, with particular attention paid to modeling how movement and connectivity are influenced by landscape characteristics. For animal movement data, a novel continuous-time, discrete-space model of animal movement is proposed. This model yields increased computational efficiency relative to existing discrete-space models for animal movement, and a more flexible modeling framework than existing continuous-space models. In landscape genetic approaches to landscape connectivity, spatially-referenced genetic allele data are used to study landscape effects on gene flow. An explicit link is described between a common circuit-theoretic approach to landscape genetics and variogram fitting for Gaussian Markov random fields. A hierarchical model for landscape genetic data is also proposed, with a multinomial data model and latent spatial random effects to model spatial correlation.Show more Item Open Access Spatial probit models for multivariate ordinal data: computational efficiency and parameter identifiability(Colorado State University. Libraries, 2013) Schliep, Erin M., author; Hoeting, Jennifer, advisor; Cooley, Daniel, committee member; Lee, Myung Hee, committee member; Webb, Colleen, committee memberShow more The Colorado Natural Heritage Program (CNHP) at Colorado State University evaluates Colorado's rare and at-risk species and habitats and promotes conservation of biological resources. One of the goals of the program is to determine the condition of wetlands across the state of Colorado. The data collected are measurements, or metrics, representing landscape condition, biotic condition, hydrologic condition, and physiochemical condition in river basins statewide. The metrics differ in variable type, including binary, ordinal, count, and continuous response data. It is common practice to uniformly discretize the metrics into ordinal values and combine them using a weighted-average to obtain a univariate measure of wetland condition. The weights assigned to each metric are based on best professional judgement. The motivation of this work was to improve on the user-defined weights by developing a statistical model to estimate the weights using observed data. The challenges of creating a model that fulfills this requirement are many. First, the observed data are multivariate and consist of different variable types which we wish to preserve. Second, the multivariate response data are not independent across river basin because wetlands at close proximity are correlated. Third, we want the model to provide a univariate measure of wetland condition that can be compared across the state. Lastly, it is of interest to the ecologists to predict the univariate measure of wetland condition at unobserved locations requiring covariate information to be incorporated into the model. We propose a multivariate multilevel latent variable model to address these challenges. Latent continuous response variables are used to model the different types of response variables. An additional latent variable, or common factor, is used as a univariate measure of wetland condition. The mean of the common factor contains observable covariate data in order to predict at unobserved locations. The variance of the common factor is defined by a spatial covariance function to account for the dependence between wetlands. The majority of the metrics reported by the CNHP are ordinal. Therefore, our primary focus is modeling multivariate ordinal response data where binary data is a special case. Probit linear models and probit linear mixed models are examples of models for ordinal response data. Probit models are attractive in that they can be defined in terms of latent variables. Computational efficiency is a major issue when fitting multivariate latent variable models in a Bayesian framework using Markov chain Monte Carlo (MCMC). There is also a high computation cost for running MCMC when fitting geostatistical spatial models. Data augmentation and parameter expansion are both modeling techniques that can lead to optimal iterative sampling algorithms for MCMC. Data augmentation allows for simpler and more feasible simulation from a posterior distribution. Parameter expansion is a method for accelerating convergence of iterative sample algorithms and can enhance data augmentation algorithms. We propose data augmentation and parameter-expanded data augmentation algorithms for fitting MCMC to spatial probit models for binary and ordinal response data. Parameter identifiability is another challenge when fitting multivariate latent variable models due to the multivariate response data, number of parameters, unobserved latent variables, and spatial random effects. We investigate parameter identifiability for the common factor model for multivariate ordinal response data. We extend the common factor model to include covariates and spatial correlation so we can predict wetland condition at unobserved locations. The partial sill and range parameter of a spatial covariance function are difficult to estimate because they are near-nonidentifiable. We propose a new parameterization for the covariance function of the spatial probit model that leads to better mixing and faster convergence of the MCMC. Whereas our spatial probit model for ordinal response data follows the common factor model approach, there are other forms of the spatial probit model. We give a comprehensive comparison of two types of spatial probit models, which we refer to as the first-stage and second-stage spatial probit model. We discuss the implications of fitting each model and compare them in terms of their impact on parameter estimation and prediction at unobserved locations. We propose a new approximation for predicting ordinal response data that is both accurate and efficient. We apply the multivariate multilevel latent variable model to data collected in the North Platte and Rio Grande River Basins to evaluate wetland condition. We obtain statistically derived weights for each of the response metrics with confidence limits. Lastly, we predict the univariate measure of wetland condition at unobserved locations.Show more Item Open Access Linear system design for compression and fusion(Colorado State University. Libraries, 2013) Wang, Yuan, author; Wang, Haonan, advisor; Scharf, Louis L., advisor; Breidt, F. Jay, committee member; Luo, Rockey J., committee memberShow more This is a study of measurement compression and fusion design. The idea common to both problems is that measurements can often be linearly compressed into lower-dimensional spaces without introducing too much excess mean-squared error or excess volume in a concentration ellipse. The question is how to design the compression to minimize the excesses at any given dimension. The first part of this work is motivated by sensing and wireless communication, where data compression or dimension reduction may be used to reduce the required communication bandwidth. The high-dimensional measurements are converted into low-dimensional representations through linear compression. Our aim is to compress a noisy measurement, allowing for the fact that the compressed measurement will be transmitted over a noisy channel. We review optimal compression with no transmission noise and show its connection with canonical coordinates. When the compressed measurement is transmitted with noise, we give the closed-form expression for the optimal compression matrix with respect to the trace and determinant of the error covariance matrix. We show that the solutions are canonical coordinate solutions, scaled by coefficients which account for canonical correlations and transmission noise variance, followed by a coordinate transformation into the sub-dominant invariant subspace of the channel noise. The second part of this work is a problem of integrating multiple sources of measurements. We consider two multiple-input-multiple-output channels, a primary channel and a secondary channel, with dependent input signals. The primary channel carries the signal of interest, and the secondary channel carries a signal that shares a joint distribution with the primary signal. The problem of particular interest is designing the secondary channel, with a fixed primary channel. We formulate the problem as an optimization problem, in which the optimal secondary channel maximizes an information-based criterion. An analytic solution is provided in a special case. Two fast-to-compute algorithms, one extrinsic and the other intrinsic, are proposed to approximate the optimal solutions in general cases. In particular, the intrinsic algorithm exploits the geometry of the unit sphere, a manifold embedded in Euclidean space. The performances of the proposed algorithms are examined through a simulation study. A discussion of the choice of dimension for the secondary channel is given, leading to rules for dimension reduction.Show more Item Open Access Constrained spline regression and hypothesis tests in the presence of correlation(Colorado State University. Libraries, 2013) Wang, Huan, author; Meyer, Mary C., advisor; Opsomer, Jean D., advisor; Breidt, F. Jay, committee member; Reich, Robin M., committee memberShow more Extracting the trend from the pattern of observations is always difficult, especially when the trend is obscured by correlated errors. Often, prior knowledge of the trend does not include a parametric family, and instead the valid assumption are vague, such as "smooth" or "monotone increasing," Incorrectly specifying the trend as some simple parametric form can lead to overestimation of the correlation, and conversely, misspecifying or ignoring the correlation leads to erroneous inference for the trend. In this dissertation, we explore spline regression with shape constraints, such as monotonicity or convexity, for estimation and inference in the presence of stationary AR(p) errors. Standard criteria for selection of penalty parameter, such as Akaike information criterion (AIC), cross-validation and generalized cross-validation, have been shown to behave badly when the errors are correlated and in the absence of shape constraints. In this dissertation, correlation structure and penalty parameter are selected simultaneously using a correlation-adjusted AIC. The asymptotic properties of unpenalized spline regression in the presence of correlation are investigated. It is proved that even if the estimation of the correlation is inconsistent, the corresponding projection estimation of the regression function can still be consistent and have the optimal asymptotic rate, under appropriate conditions. The constrained spline fit attains the convergence rate of unconstrained spline fit in the presence of AR(p) errors. Simulation results show that the constrained estimator typically behaves better than the unconstrained version if the true trend satisfies the constraints. Traditional statistical tests for the significance of a trend rely on restrictive assumptions on the functional form of the relationship, e.g. linearity. In this dissertation, we develop testing procedures that incorporate shape restrictions on the trend and can account for correlated errors. These tests can be used in checking whether the trend is constant versus monotone, linear versus convex/concave and any combinations such as, constant versus increase and convex. The proposed likelihood ratio test statistics have an exact null distribution if the covariance matrix of errors is known. Theorems are developed for the asymptotic distributions of test statistics if the covariance matrix is unknown but the test statistics use a consistent estimator of correlation into their estimation. The comparisons of the proposed test with the F-test with the unconstrained alternative fit and the one-sided t-test with simple regression alternative fit are conducted through intensive simulations. Both test size and power of the proposed test are favorable, smaller test size and greater power in general, comparing to the F-test and t-test.Show more Item Open Access Joint tail modeling via regular variation with applications in climate and environmental studies(Colorado State University. Libraries, 2013) Weller, Grant B., author; Cooley, Dan, advisor; Breidt, F. Jay, committee member; Estep, Donald, committee member; Schumacher, Russ, committee memberShow more This dissertation presents applied, theoretical, and methodological advances in the statistical analysis of multivariate extreme values, employing the underlying mathematical framework of multivariate regular variation. Existing theory is applied in two studies in climatology; these investigations represent novel applications of the regular variation framework in this field. Motivated by applications in environmental studies, a theoretical development in the analysis of extremes is introduced, along with novel statistical methodology. This work first details a novel study which employs the regular variation modeling framework to study uncertainties in a regional climate model's simulation of extreme precipitation events along the west coast of the United States, with a particular focus on the Pineapple Express (PE), a special type of winter storm. We model the tail dependence in past daily precipitation amounts seen in observational data and output of the regional climate model, and we link atmospheric pressure fields to PE events. The fitted dependence model is utilized as a stochastic simulator of future extreme precipitation events, given output from a future-scenario run of the climate model. The simulator and link to pressure fields are used to quantify the uncertainty in a future simulation of extreme precipitation events from the regional climate model, given boundary conditions from a general circulation model. A related study investigates two case studies of extreme precipitation from six regional climate models in the North American Regional Climate Change Assessment Program (NARCCAP). We find that simulated winter season daily precipitation along the Pacific coast exhibit tail dependence to extreme events in the observational record. When considering summer season daily precipitation over a central region of the United States, however, we find almost no correspondence between extremes simulated by NARCCAP and those seen in observations. Furthermore, we discover less consistency among the NARCCAP models in the tail behavior of summer precipitation over this region than that seen in winter precipitation over the west coast region. The analyses in this work indicate that the NARCCAP models are effective at downscaling winter precipitation extremes in the west coast region, but questions remain about their ability to simulate summer-season precipitation extremes in the central region. A deficiency of existing modeling techniques based on the multivariate regular variation framework is the inability to account for hidden regular variation, a feature of many theoretical examples and real data sets. One particular example of this deficiency is the inability to distinguish asymptotic independence from independence in the usual sense. This work develops a novel probabilistic characterization of random vectors possessing hidden regular variation as the sum of independent components. The characterization is shown to be asymptotically valid via a multivariate tail equivalence result, and an example is demonstrated via simulation. The sum characterization is employed to perform inference for the joint tail of random vectors possessing hidden regular variation. This dissertation develops a likelihood-based estimation procedure, employing a novel version of the Monte Carlo expectation-maximization algorithm which has been modified for tail estimation. The methodology is demonstrated on simulated data and applied to a bivariate series of air pollution data from Leeds, UK. We demonstrate the improvement in tail risk estimates offered by the sum representation over approaches which ignore hidden regular variation in the data.Show more Item Open Access Model selection and nonparametric estimation for regression models(Colorado State University. Libraries, 2014) He, Zonglin, author; Opsomer, Jean, advisor; Breidt, F. Jay, committee member; Meyer, Mary, committee member; Elder, John, committee memberShow more In this dissertation, we deal with two different topics in statistics. The first topic in survey sampling deals with variable selection for linear regression model from which we will sample with a possibly informative design. Under the assumption that the finite population is generated by a multivariate linear regression model from which we will sample with a possibly informative design, we particularly study the variable selection criterion named predicted residual sums of squares in the sampling context theoretically. We examine the asymptotic properties of weighted and unweighted predicted residual sums of squares under weighted least squares regression estimation and ordinary least squares regression estimation. One simulation study for the variable selection criteria are provided, with the purpose of showing their ability to select the correct model in the practical situation. For the second topic, we are interested in fitting a nonparametric regression model to data for the situation in which some of the covariates are categorical. In the univariate case where the covariate is a ordinal variable, we extend the local polynomial estimator, which normally requires continuous covariates, to a local polynomial estimator that allows for ordered categorical covariates. We derive the asymptotic conditional bias and variance for the local polynomial estimator with ordinal covariate, under the assumption that the categories correspond to quantiles of an unobserved continuous latent variable. We conduct a simulation study with two patterns of ordinal data to evaluate our estimator. In the multivariate case where the covariates contain a mixture of continuous, ordinal, and nominal variables, we use a Nadaraya-Watson estimator with generalized product kernel. We derive the asymptotic conditional bias and variance for the Nadaraya-Watson estimator with continuous, ordinal, and nominal covariates, under the assumption that the categories of the ordinal covariate correspond to quantiles of an unobserved continuous latent variable. We conduct a multivariate simulation study to evaluate our Nadaraya-Watson estimator with generalized product kernel.Show more Item Open Access Testing and adjusting for informative sampling in survey data(Colorado State University. Libraries, 2014) Herndon, Wade Wilson, author; Breidt, F. Jay, advisor; Opsomer, Jean, advisor; Cooley, Daniel, committee member; Meyer, Mary, committee member; Doherty, Paul, committee memberShow more Fitting models to survey data can be problematic due to the potentially complex sampling mechanism through which the observed data are selected. Survey weights have traditionally been used to adjust for unequal inclusion probabilities under the design-based paradigm of inference, however, this limits the ability of analysts to make inference of a more general kind, such as to characteristics of a superpopulation. The problems induced by the presence of a complex sampling design can be generally contained under the heading of informative sampling. To say that the sampling is informative is to say that the distribution of the data in the sample is different from the distribution of the data in the population. Two major topics relating to analyzing survey data with (potentially) informative sampling are addressed: testing for informativeness, and model building in the presence of informative sampling. First addressed is the problem of running formal tests for informative sampling in survey data. The major contribution contained here is to detail a new test for informative sampling. The test is shown to be widely applicable and straight-forward to implement in practice, and also useful compared to existing tests. The test is illustrated through a variety of empirical studies as well. These applications include a censored regression problem, linear regression, logistic regression, and fitting a gamma mixture model. Results from the analogous bootstrap test are also presented; these results agree with the analytic versions of the test. Alternative tests for informative sampling do in fact exist, however, the existing methods each have significant drawbacks and limitations which may be resolved in some situation with this new methodology, and overall the literature is quite sparse in this area. In a simulation study, the test is shown to have many desirable properties and maintains high power compared to alternative tests. Also included is discussion about the limiting distribution of the test statistic under a sequence of local alternative hypotheses, and some extensions that are useful in connecting the work contained here with some of the previous work in the area. These extensions also help motivate the semiparametric methods considered in the chapter that follows. The next topic explored is semiparametric methods for including design information in a regression model while staying within a model-based inferential framework. The ideas explored here attempt to exploit relationships between design variables (such as the sample inclusion probabilities) and model covariates. In order to account for the complex sampling design and (potential) bias in estimating model parameters, design variables are included as covariates and considered to be functions of the model covariates that can then be estimated in a design-based paradigm using nonparametric methods. The nonparametric method explored here is kernel smoothing with degree zero. In principle, other (and more complex) kinds of estimators could be used to estimate the functions of the design variables conditional on the model covariates, but the framework presented here provides asymptotic results for only the more simple case of kernel smoothing. The method is illustrated via empirical applications and also through a simulation study in which confidence band coverage rates from the semiparametric method are compared to those obtained through regular linear regression. The semiparametric estimator soundly outperforms the regression estimator.Show more Item Open Access Semiparametric regression in the presence of complex variance structures arising from small angle x-ray scattering data(Colorado State University. Libraries, 2014) Bugbee, Bruce D., author; Breidt, F. Jay, advisor; Estep, Don, advisor; Meyer, Mary, committee member; Hoeting, Jennifer, committee member; Luger, Karolin, committee memberShow more An ongoing problem in structural biology is how best to infer structural information for complex, biological macromolecules from indirect observational data. Molecular shape dictates functionality but is not always directly observable. There exists a wide class of experimental methods whose data can be used for indirectly inferring molecular shape features with varying degrees of resolution. Of these methods, small angle X-ray scattering (SAXS) is desirable due to low requirements on the sample of interest. However, SAXS data suffers numerous statistical problems that require the development of novel methodologies. A primary concern is the impact of radially reducing two-dimensional sensor data to a series of smooth mean and variance curves. Additionally, pronounced heteroskedasticity is often observed near sensor boundaries. The work presented here focuses on developing general model frameworks and implementation methods appropriate for SAXS data. Semiparametric regression refers to models that combine known parametric structures with flexible nonparametric components. Three semiparametric regression model frameworks that are well-suited for handling smooth data are presented. The first model introduced is the standard semiparametric regression model, described as a mixed model with low rank penalized splines as random effects. The second model extends the first to the case of heteroskedastic errors, which violate standard model assumptions. The latent variance function in the model is estimated through an additional semiparametric regression, allowing for appropriate uncertainty estimation at the mean level. The final model considers a data structure unique to SAXS experiments. This model incorporates both radial mean and radial variance data in hopes to better infer three-dimensional shape properties and understand experimental effects by including all available data. Each of the three model frameworks is structured hierarchically. Bayesian inference is appealing in this context, as it provides efficient and generalized modeling frameworks in a unified way. The main statistical contributions of this thesis are from the specific methods developed to address the computational challenges of Bayesian inference for these models. The contributions include new Markov Chain Monte Carlo (MCMC) procedures for numerical approximation of posterior distributions and novel variational approximations that are extremely fast and accurate. For the heteroskedastic semiparametric case, known form posterior conditionals are available for all model parameters save for the regression coefficients controlling the latent model variance function. A novel implementation of a multivariate delayed rejection adaptive Metropolis (DRAM) procedure is used to sample from this posterior conditional distribution. The joint model for radial mean and radial variance data is shown to be of comparable structure to the heteroskedastic case and the new DRAM methodology is extended to handle this case. Simulation studies of all three methods are provided, showing that these models provide accurate fits of observed data and latent variance functions. The demands of scientific data processing in the context of SAXS, where large data sets are rapidly attained, lead to consideration of fast approximations as alternatives to MCMC. {Variational approximations} or {Variational Bayes} describes a class of approximation methods where the posterior distribution of the parameters is approximated by minimizing the Kullback-Leibler divergence between the true posterior and a class of distributions under mild structural constraints. Variational approximations have been shown to be good approximations of true posteriors in many cases. A novel variational approximation for the general heteroskedastic semiparametric regression model is derived here. Simulation studies are provided demonstrating fit and coverage properties comparable to the DRAM results at a fraction of the computational cost. A variational approximation for the joint model of radial mean and variance data is also provided but is shown to suffer from poor performance due to high correlation across a subset of regression parameters. The heteroskedastic semiparametric regression framework has some strong structural relationships with a distinct, important problem: spatially adaptive smoothing. A noisy function with different amounts of smoothness over its domain may be systematically under-smoothed or over-smoothed if the smoothing is not spatially adaptive. A novel variational approximation is derived for the problem of spatially adaptive penalized spline regression, and shown to have excellent performance. This approximation method is shown to be able to fit highly oscillatory data while not requiring the traditional tuning and computational resources of standard MCMC implementations. Potential scientific contribution of the statistical methodology developed here are illuminated with SAXS data examples. Analysis of SAXS data typically has two primary concerns: description of experimental effects and estimation of physical shape parameters. Formal statistical procedures for testing the effect of sample concentration and exposure time are presented as alternatives to current methods, in which data sets are evaluated subjectively and often combined in ad hoc ways. Additionally, estimation procedures for the scattering intensity at zero angle, known to be proportional to molecular weight, and the radius of gyration are described along with appropriate measures of uncertainty. Finally, a brief example of the joint radial mean and variance method is provided. Guidelines for extending the models presented here to more complex SAXS problems are also given.Show more