Theses and Dissertations
Permanent URI for this collection
Browse
Browsing Theses and Dissertations by Author "Breidt, F. Jay, committee member"
Now showing 1 - 18 of 18
Results Per Page
Sort Options
Item Open Access A penalized estimation procedure for varying coefficient models(Colorado State University. Libraries, 2015) Tu, Yan, author; Wang, Haonan, advisor; Breidt, F. Jay, committee member; Chapman, Phillip, committee member; Luo, J. Rockey, committee memberVarying coefficient models are widely used for analyzing longitudinal data. Various methods for estimating coefficient functions have been developed over the years. We revisit the problem under the theme of functional sparsity. The problem of sparsity, including global sparsity and local sparsity, is a recurrent topic in nonparametric function estimation. A function has global sparsity if it is zero over the entire domain, and it indicates that the corresponding covariate is irrelevant to the response variable. A function has local sparsity if it is nonzero but remains zero for a set of intervals, and it identifies an inactive period of the corresponding covariate. Each type of sparsity has been addressed in the literature using the idea of regularization to improve estimation as well as interpretability. In this dissertation, a penalized estimation procedure has been developed to achieve functional sparsity, that is, simultaneously addressing both types of sparsity in a unified framework. We exploit the property of B-spline approximation and group bridge penalization. Our method is illustrated in simulation study and real data analysis, and outperforms the existing methods in identifying both local sparsity and global sparsity. Asymptotic properties of estimation consistency and sparsistency of the proposed method are established. The term of sparsistency refers to the property that the functional sparsity can be consistently detected.Item Open Access Analysis of structured data and big data with application to neuroscience(Colorado State University. Libraries, 2015) Sienkiewicz, Ela, author; Wang, Haonan, advisor; Meyer, Mary, committee member; Breidt, F. Jay, committee member; Hayne, Stephen, committee memberNeuroscience research leads to a remarkable set of statistical challenges, many of them due to the complexity of the brain, its intricate structure and dynamical, non-linear, often non-stationary behavior. The challenge of modeling brain functions is magnified by the quantity and inhomogeneity of data produced by scientific studies. Here we show how to take advantage of advances in distributed and parallel computing to mitigate memory and processor constraints and develop models of neural components and neural dynamics. First we consider the problem of function estimation and selection in time-series functional dynamical models. Our motivating application is on the point-process spiking activities recorded from the brain, which poses major computational challenges for modeling even moderately complex brain functionality. We present a big data approach to the identification of sparse nonlinear dynamical systems using generalized Volterra kernels and their approximation using B-spline basis functions. The performance of the proposed method is demonstrated in experimental studies. We also consider a set of unlabeled tree objects with topological and geometric properties. For each data object, two curve representations are developed to characterize its topological and geometric aspects. We further define the notions of topological and geometric medians as well as quantiles based on both representations. In addition, we take a novel approach to define the Pareto medians and quantiles through a multi-objective optimization problem. In particular, we study two different objective functions which measure the topological variation and geometric variation respectively. Analytical solutions are provided for topological and geometric medians and quantiles, and in general, for Pareto medians and quantiles the genetic algorithm is implemented. The proposed methods are applied to analyze a data set of pyramidal neurons.Item Open Access Change-Point estimation using shape-restricted regression splines(Colorado State University. Libraries, 2016) Liao, Xiyue, author; Meyer, Mary C., advisor; Breidt, F. Jay, committee member; Homrighausen, Darren, committee member; Belfiori, Elisa, committee memberChange-Point estimation is in need in fields like climate change, signal processing, economics, dose-response analysis etc, but it has not yet been fully discussed. We consider estimating a regression function ƒm and a change-point m, where m is a mode, an inflection point, or a jump point. Linear inequality constraints are used with spline regression functions to estimate m and ƒm simultaneously using profile methods. For a given m, the maximum-likelihood estimate of ƒm is found using constrained regression methods, then the set of possible change-points is searched to find the ˆm that maximizes the likelihood. Convergence rates are obtained for each type of change-point estimator, and we show an oracle property, that the convergence rate of the regression function estimator is as if m were known. Parametrically modeled covariates are easily incorporated in the model. Simulations show that for small and moderate sample sizes, these methods compare well to existing methods. The scenario when the random error is from a stationary autoregressive process is also presented. Under such a scenario, the change-point and parameters of the stationary autoregressive process, such as autoregressive coefficients and the model variance, are estimated together via Cochran-Orcutt-type iterations. Simulations are conducted and it is shown that the change-point estimator performs well in terms of choosing the right order of the autoregressive process. Penalized spline-based regression is also discussed as an extension. Given a large number of knots and a penalty parameter which controls the effective degrees of freedom of a shape-restricted model, penalized methods give smoother fits while balance between under- and over-fitting. A bootstrap confidence interval for a change-point is established. By generating random change-points from a curve on the unit interval, we compute the coverage rate of the bootstrap confidence interval using penalized estimators, which shows advantages such as robustness over competitors. The methods are available in the R package ShapeChange on the Comprehensive R Archival Network (CRAN). Moreover, we discuss the shape selection problem when there are more than one possible shapes for a given data set. A project with the Forest Inventory & Analysis (FIA) scientists is included as an example. In this project, we apply shape-restricted spline-based estimators, among which the one-jump and double-jump estimators are emphasized, to time-series Landsat imagery for the purpose of modeling, mapping, and monitoring annual forest disturbance dynamics. For each pixel and spectral band or index of choice in temporal Landsat data, our method delivers a smoothed rendition of the trajectory constrained to behave in an ecologically sensible manner, reflecting one of seven possible “shapes”. Routines to realize the methodology are built in the R package ShapeSelectForest on CRAN, and techniques in this package are being applied for forest disturbance and attribute mapping across the conterminous U.S.. The Landsat community will implement techniques in this package on the Google Earth Engine in 2016. Finally, we consider the change-point estimation with generalized linear models. Such work can be applied to dose-response analysis, when the effect of a drug increases as the dose increases to a saturation point, after which the effect starts decreasing.Item Open Access Constrained spline regression and hypothesis tests in the presence of correlation(Colorado State University. Libraries, 2013) Wang, Huan, author; Meyer, Mary C., advisor; Opsomer, Jean D., advisor; Breidt, F. Jay, committee member; Reich, Robin M., committee memberExtracting the trend from the pattern of observations is always difficult, especially when the trend is obscured by correlated errors. Often, prior knowledge of the trend does not include a parametric family, and instead the valid assumption are vague, such as "smooth" or "monotone increasing," Incorrectly specifying the trend as some simple parametric form can lead to overestimation of the correlation, and conversely, misspecifying or ignoring the correlation leads to erroneous inference for the trend. In this dissertation, we explore spline regression with shape constraints, such as monotonicity or convexity, for estimation and inference in the presence of stationary AR(p) errors. Standard criteria for selection of penalty parameter, such as Akaike information criterion (AIC), cross-validation and generalized cross-validation, have been shown to behave badly when the errors are correlated and in the absence of shape constraints. In this dissertation, correlation structure and penalty parameter are selected simultaneously using a correlation-adjusted AIC. The asymptotic properties of unpenalized spline regression in the presence of correlation are investigated. It is proved that even if the estimation of the correlation is inconsistent, the corresponding projection estimation of the regression function can still be consistent and have the optimal asymptotic rate, under appropriate conditions. The constrained spline fit attains the convergence rate of unconstrained spline fit in the presence of AR(p) errors. Simulation results show that the constrained estimator typically behaves better than the unconstrained version if the true trend satisfies the constraints. Traditional statistical tests for the significance of a trend rely on restrictive assumptions on the functional form of the relationship, e.g. linearity. In this dissertation, we develop testing procedures that incorporate shape restrictions on the trend and can account for correlated errors. These tests can be used in checking whether the trend is constant versus monotone, linear versus convex/concave and any combinations such as, constant versus increase and convex. The proposed likelihood ratio test statistics have an exact null distribution if the covariance matrix of errors is known. Theorems are developed for the asymptotic distributions of test statistics if the covariance matrix is unknown but the test statistics use a consistent estimator of correlation into their estimation. The comparisons of the proposed test with the F-test with the unconstrained alternative fit and the one-sided t-test with simple regression alternative fit are conducted through intensive simulations. Both test size and power of the proposed test are favorable, smaller test size and greater power in general, comparing to the F-test and t-test.Item Open Access Inference for functional time series with applications to yield curves and intraday cumulative returns(Colorado State University. Libraries, 2016) Young, Gabriel J., author; Kokoszka, Piotr S., advisor; Miao, Hong, committee member; Breidt, F. Jay, committee member; Zhou, Wen, committee memberEconometric and financial data often take the form of a functional time series. Examples include yield curves, intraday price curves and term structure curves. Before an attempt is made to statistically model or predict such series, we must address whether or not such a series can be assumed stationary or trend stationary. We develop extensions of the KPSS stationarity test to functional time series. Motivated by the problem of a change in the mean structure of yield curves, we also introduce several change point methods applied to dynamic factor models. For all testing procedures, we include a complete asymptotic theory, a simulation study, illustrative data examples, as well as details of the numerical implementation of the testing procedures. The impact of scheduled macroeconomic announcements has been shown to account for sizable fractions of total annual realized stock returns. To assess this impact, we develop methods of derivative estimation which utilize a functional analogue of local-polynomial smoothing. The confidence bands are then used to find time intervals of statistically increasing cumulative returns.Item Open Access Infinite dimensional stochastic inverse problems(Colorado State University. Libraries, 2018) Yang, Lei, author; Estep, Donald, advisor; Breidt, F. Jay, committee member; Tavener, Simon, committee member; Zhou, Wen, committee memberIn many disciplines, mathematical models such as differential equations, are used to characterize physical systems. The model induces a complex nonlinear measurable map from the domain of physical parameters to the range of observable Quantities of Interest (QoI) computed by applying a set of functionals to the solution of the model. Often the parameters can not be directly measured, and people are confronted with the task of inferring information about values of the parameters given the measured or imposed information about the values of the QoI. In such applications, there is generally significant uncertainty in the measured values of the QoI. Uncertainty is often modeled using probability distributions. For example, a probability structure imposed on the domain of the parameters induces a corresponding probability structure on the range of the QoI. This is the well known Stochastic Forward Problem that is typically solved using a variation of the Monte Carlo method. This dissertation is concerned with the Stochastic Inverse Problems (SIP) where the probability distributions are imposed on the range of the QoI, and problem is to compute the induced distributions on the domain of the parameters. In our formulation of the SIP and its generalization for the case where the physical parameters are functions, main topics including the existence, continuity and numerical approximations of the solutions are investigated. Chapter 1 introduces the background and previous research on the SIP. It also gives useful theorems, results and notation used later. Chapter 2 begins by establishing a relationship between Lebesgue measures on the domain and the range, and then studies the form of solution of the SIP and its continuity properties. Chapter 3 proposes an algorithm for computing the solution of the SIP, and discusses the convergence of the algorithm to the true solution. Chapter 4 exploits the fact that a function can be represented by its coefficients with respect to a basis, and extends the SIP framework to allow for cases where the domain representing the basis coefficients is a countable cube with decaying edges, referred to as the infinite dimensional SIP. We then discusses how its solution can be approximated by the SIP for which the domain is the finite dimensional cube obtained by taking a finite dimensional projection of the countable cube. Chapter 5 begins with an algorithm for approximating the solution of the infinite dimensional SIP, and then proves the algorithm converges to the true solution. Chapter 6 gives a numerical example showing the effects of different decay rates and the relation to truncation to finite dimensions. Chapter 7 reviews popular probabilistic inverse problem methods and proposes a combination of the SIP and statistical models to address problems encountered in practice.Item Open Access Joint tail modeling via regular variation with applications in climate and environmental studies(Colorado State University. Libraries, 2013) Weller, Grant B., author; Cooley, Dan, advisor; Breidt, F. Jay, committee member; Estep, Donald, committee member; Schumacher, Russ, committee memberThis dissertation presents applied, theoretical, and methodological advances in the statistical analysis of multivariate extreme values, employing the underlying mathematical framework of multivariate regular variation. Existing theory is applied in two studies in climatology; these investigations represent novel applications of the regular variation framework in this field. Motivated by applications in environmental studies, a theoretical development in the analysis of extremes is introduced, along with novel statistical methodology. This work first details a novel study which employs the regular variation modeling framework to study uncertainties in a regional climate model's simulation of extreme precipitation events along the west coast of the United States, with a particular focus on the Pineapple Express (PE), a special type of winter storm. We model the tail dependence in past daily precipitation amounts seen in observational data and output of the regional climate model, and we link atmospheric pressure fields to PE events. The fitted dependence model is utilized as a stochastic simulator of future extreme precipitation events, given output from a future-scenario run of the climate model. The simulator and link to pressure fields are used to quantify the uncertainty in a future simulation of extreme precipitation events from the regional climate model, given boundary conditions from a general circulation model. A related study investigates two case studies of extreme precipitation from six regional climate models in the North American Regional Climate Change Assessment Program (NARCCAP). We find that simulated winter season daily precipitation along the Pacific coast exhibit tail dependence to extreme events in the observational record. When considering summer season daily precipitation over a central region of the United States, however, we find almost no correspondence between extremes simulated by NARCCAP and those seen in observations. Furthermore, we discover less consistency among the NARCCAP models in the tail behavior of summer precipitation over this region than that seen in winter precipitation over the west coast region. The analyses in this work indicate that the NARCCAP models are effective at downscaling winter precipitation extremes in the west coast region, but questions remain about their ability to simulate summer-season precipitation extremes in the central region. A deficiency of existing modeling techniques based on the multivariate regular variation framework is the inability to account for hidden regular variation, a feature of many theoretical examples and real data sets. One particular example of this deficiency is the inability to distinguish asymptotic independence from independence in the usual sense. This work develops a novel probabilistic characterization of random vectors possessing hidden regular variation as the sum of independent components. The characterization is shown to be asymptotically valid via a multivariate tail equivalence result, and an example is demonstrated via simulation. The sum characterization is employed to perform inference for the joint tail of random vectors possessing hidden regular variation. This dissertation develops a likelihood-based estimation procedure, employing a novel version of the Monte Carlo expectation-maximization algorithm which has been modified for tail estimation. The methodology is demonstrated on simulated data and applied to a bivariate series of air pollution data from Leeds, UK. We demonstrate the improvement in tail risk estimates offered by the sum representation over approaches which ignore hidden regular variation in the data.Item Open Access Linear system design for compression and fusion(Colorado State University. Libraries, 2013) Wang, Yuan, author; Wang, Haonan, advisor; Scharf, Louis L., advisor; Breidt, F. Jay, committee member; Luo, Rockey J., committee memberThis is a study of measurement compression and fusion design. The idea common to both problems is that measurements can often be linearly compressed into lower-dimensional spaces without introducing too much excess mean-squared error or excess volume in a concentration ellipse. The question is how to design the compression to minimize the excesses at any given dimension. The first part of this work is motivated by sensing and wireless communication, where data compression or dimension reduction may be used to reduce the required communication bandwidth. The high-dimensional measurements are converted into low-dimensional representations through linear compression. Our aim is to compress a noisy measurement, allowing for the fact that the compressed measurement will be transmitted over a noisy channel. We review optimal compression with no transmission noise and show its connection with canonical coordinates. When the compressed measurement is transmitted with noise, we give the closed-form expression for the optimal compression matrix with respect to the trace and determinant of the error covariance matrix. We show that the solutions are canonical coordinate solutions, scaled by coefficients which account for canonical correlations and transmission noise variance, followed by a coordinate transformation into the sub-dominant invariant subspace of the channel noise. The second part of this work is a problem of integrating multiple sources of measurements. We consider two multiple-input-multiple-output channels, a primary channel and a secondary channel, with dependent input signals. The primary channel carries the signal of interest, and the secondary channel carries a signal that shares a joint distribution with the primary signal. The problem of particular interest is designing the secondary channel, with a fixed primary channel. We formulate the problem as an optimization problem, in which the optimal secondary channel maximizes an information-based criterion. An analytic solution is provided in a special case. Two fast-to-compute algorithms, one extrinsic and the other intrinsic, are proposed to approximate the optimal solutions in general cases. In particular, the intrinsic algorithm exploits the geometry of the unit sphere, a manifold embedded in Euclidean space. The performances of the proposed algorithms are examined through a simulation study. A discussion of the choice of dimension for the secondary channel is given, leading to rules for dimension reduction.Item Open Access Model selection and nonparametric estimation for regression models(Colorado State University. Libraries, 2014) He, Zonglin, author; Opsomer, Jean, advisor; Breidt, F. Jay, committee member; Meyer, Mary, committee member; Elder, John, committee memberIn this dissertation, we deal with two different topics in statistics. The first topic in survey sampling deals with variable selection for linear regression model from which we will sample with a possibly informative design. Under the assumption that the finite population is generated by a multivariate linear regression model from which we will sample with a possibly informative design, we particularly study the variable selection criterion named predicted residual sums of squares in the sampling context theoretically. We examine the asymptotic properties of weighted and unweighted predicted residual sums of squares under weighted least squares regression estimation and ordinary least squares regression estimation. One simulation study for the variable selection criteria are provided, with the purpose of showing their ability to select the correct model in the practical situation. For the second topic, we are interested in fitting a nonparametric regression model to data for the situation in which some of the covariates are categorical. In the univariate case where the covariate is a ordinal variable, we extend the local polynomial estimator, which normally requires continuous covariates, to a local polynomial estimator that allows for ordered categorical covariates. We derive the asymptotic conditional bias and variance for the local polynomial estimator with ordinal covariate, under the assumption that the categories correspond to quantiles of an unobserved continuous latent variable. We conduct a simulation study with two patterns of ordinal data to evaluate our estimator. In the multivariate case where the covariates contain a mixture of continuous, ordinal, and nominal variables, we use a Nadaraya-Watson estimator with generalized product kernel. We derive the asymptotic conditional bias and variance for the Nadaraya-Watson estimator with continuous, ordinal, and nominal covariates, under the assumption that the categories of the ordinal covariate correspond to quantiles of an unobserved continuous latent variable. We conduct a multivariate simulation study to evaluate our Nadaraya-Watson estimator with generalized product kernel.Item Open Access Penalized isotonic regression and an application in survey sampling(Colorado State University. Libraries, 2016) Wu, Jiwen, author; Opsomer, Jean D., advisor; Meyer, Mary C., advisor; Breidt, F. Jay, committee member; Doherty, Paul, committee memberIn isotonic regression, the mean function is assumed to be monotone increasing (or de- creasing) but otherwise unspecified. The classical isotonic least-squares estimator is known to be inconsistent at boundaries; this is called the spiking problem. A penalty on the range of the regression function is proposed to correct the spiking problem for univariate and mul- tivariate isotonic models. The penalized estimator is shown to be consistent everywhere for a wide range of sizes of the penalty parameter. For the univariate case, the optimal penalty is shown to depend on the derivatives of the true regression function at the boundaries. Pointwise confidence intervals are constructed using the penalized estimator and bootstrap- ping ideas; these are shown through simulations to behave well in moderate sized samples. Simulation studies also show that the power of the hypothesis test of constant versus in- creasing regression function improves substantially compared to the power of the test with unpenalized alternative, and also compares favorably to tests using parametric alternatives. The application of isotonic regression is also considered in the survey context where many variables contain natural orderings that should be respected in the estimates. For instance, the National Compensation Survey estimates mean wages for many job categories, and these mean wages are expected to be non-decreasing according to job level. In this type of situation, isotonic regression can be applied to give constrained estimators satisfying the monotonicity. We combine domain estimation and the pooled adjacent violators algorithm to construct new design-weighted constrained estimators. The resulting estimator is the classical design- based domain estimator but after adaptive pooling of neighboring domains, so that it is both readily implemented in large-scale surveys and easy to explain to data users. Under mild conditions on the sampling design and the population, the estimators are shown to be design consistent and asymptotically normal. Confidence intervals for domain means using linearization-based and replication-based variance estimation show marked improvements compared to survey estimators that do not incorporate the constraints. Furthermore, a cone projection algorithm is implemented in the domain mean estimate to accommodate qualitative constraints in the case of two covariates. Theoretical properties of the constrained estimators have been investigated and a simulation study is used to demonstrate the improvement of confidence interval when using the constrained estimate. We also provide a relaxed monotone constraint to loosen the qualitative assumptions, where the extent of departure from monotonicity can be controlled by a weight function and a chosen bandwidth. We compare the unconstrained estimate, constrained estimate without penalty, constrained estimate with penalty, and the relax constrained estimate. Improvements are found in the confidence interval with higher coverage rates and smaller confidence size when incorporating the constraints, and the penalized version fixes the spiking problem at the boundary.Item Open Access Regression of network data: dealing with dependence(Colorado State University. Libraries, 2019) Marrs, Frank W., author; Fosdick, Bailey K., advisor; Breidt, F. Jay, committee member; Zhou, Wen, committee member; Wilson, James B., committee memberNetwork data, which consist of measured relations between pairs of actors, characterize some of the most pressing problems of our time, from environmental treaty legislation to human migration flows. A canonical problem in analyzing network data is to estimate the effects of exogenous covariates on a response that forms a network. Unlike typical regression scenarios, network data often naturally engender excess statistical dependence -- beyond that represented by covariates -- due to relations that share an actor. For analyzing bipartite network data observed over time, we propose a new model that accounts for excess network dependence directly, as this dependence is of scientific interest. In an example of international state interactions, we are able to infer the networks of influence among the states, such as which states' military actions are likely to incite other states' military actions. In the remainder of the dissertation, we focus on situations where inference on effects of exogenous covariates on the network is the primary goal of the analysis, and thus, the excess network dependence is a nuisance effect. In this setting, we leverage an exchangeability assumption to propose novel parsimonious estimators of regression coefficients for both binary and continuous network data, and new estimators for coefficient standard errors for continuous network data. The exchangeability assumption we rely upon is pervasive in network and array models in the statistics literature, but not previously considered when adjusting for dependence in a regression of network data. Although the estimators we propose are aligned with many network models in the literature, our estimators are derived from the assumption of exchangeability rather than proposing a particular parametric model for representing excess network dependence in the data.Item Open Access Sliced inverse approach and domain recovery for stochastic inverse problems(Colorado State University. Libraries, 2021) Chi, Jiarui, author; Wang, Haonan, advisor; Estep, Don, advisor; Breidt, F. Jay, committee member; Tavener, Simon, committee member; Zhou, Wen, committee memberThis dissertation tackles several critical challenges related to the Stochastic Inverse Problem (SIP) to perform scientific inference and prediction for complex physical systems which are characterized by mathematical models, e.g. differential equations. We treat both discrete and continuous cases. The SIP concerns inferring the values and quantifying the uncertainty of the inputs of a model, which are considered as random and unobservable quantities governing system behavior, by using observational data on the model outputs. Uncertainty of the inputs is quantified through probability distributions on the input domain which induce the probability distribution on the outputs realized by the observational data. The formulation of the SIP is based on rigorous measure-theoretic probability theory that uses all the information encapsulated in both the model and data. We introduce a problem in which a portion of the inputs can be observed and varied to study the hidden inputs, and we employ a formulation of the problem that uses all the knowledge in multiple experiments by varying the observable inputs. Since the map that the model induces is typically not 1-1, an ansatz, i.e. an assumption of some prior information, is necessary to be imposed in order to determine a specific solution of the SIP. The resulting solution is heavily conditioned on the observable inputs and we seek to combine solutions from different values of the observable inputs in order to reduce that dependence. We propose an approach of combining the individual solutions based on the framework of the Dempster-Shafer theory, which removes the dependency on the experiments as well as the ansatz and provides useful distributional information about the unobservable inputs, more specifically, about the ansatz. We develop an iterative algorithm that updates the ansatz information in order to obtain a best form of the solution for all experiments. The philosophy of Bayesian approaches is similar to that of the SIP in the sense that they both consider random variables as the model inputs and they seek to update the unobservable solution using information obtained from observations. We extend the classical Bayesian in the context of the SIP by incorporating the knowledge of the model. The input domain is a pre-specified condition for the SIP given by the knowledge from scientists and is often assumed to be a compact metric space. The supports of the probability distributions computed in the SIP are restricted to the domain, and thus an inappropriate choice of domain might cause a massive loss of information in the solutions. Similarly, we combine the individual solutions from multiple experiments to recover a unique domain among many choices of domain induced by the distribution of the inputs in general cases. In particular, results on the convergence of the domain recovery in linear models are investigated.Item Open Access Some topics on survey estimators under shape constraints(Colorado State University. Libraries, 2021) Xu, Xiaoming, author; Meyer, Mary C., advisor; Breidt, F. Jay, committee member; Zhou, Wen, committee member; Chong, Edwin K. P., committee memberWe consider three topics in this dissertation: 1) Nonresponse weighting adjustment using estimated response probability; 2) Improved variance estimation for inequality constrained domain mean estimators in surveys; and 3) One-sided testing of population domain means in surveys. Weighting by the inverse of the estimated response probabilities is a common type of adjustment for nonresponse in surveys. In the first topic, we propose a new survey estimator under nonresponse where we set the response model in linear form and the parameters are estimated by fitting a constrained least square regression model, with the constraint being a calibration equation. We examine asymptotic properties of Horvitz-Thompson and Hájek versions of the estimators. Variance estimation for the proposed estimators is also discussed. In a limited simulation study, the performances of the estimators are compared with those of the corresponding uncalibrated estimators in terms of unbiasedness, MSE and coverage rate. In survey domain estimation, a priori information can often be imposed in the form of linear inequality constraints on the domain estimators. Wu et al. (2016) formulated the isotonic domain mean estimator, for the simple order restriction, and methods for more general constraints were proposed in Oliva-Avilés et al. (2020). When the assumptions are valid, imposing restrictions on the estimators will ensure that the a priori information is respected, and in addition allows information to be pooled across domains, resulting in estimators with smaller variance. In the second topic, we propose a method to further improve the estimation of the covariance matrix for these constrained domain estimators, using a mixture of possible covariance matrices obtained from the inequality constraints. We prove consistency of the improved variance estimator, and simulations demonstrate that the new estimator results in improved coverage probabilities for domain mean confidence intervals, while retaining the smaller confidence interval lengths. Recent work in survey domain estimation allows for estimation of population domain means under a priori assumptions expressed in terms of linear inequality constraints. Imposing the constraints has been shown to provide estimators with smaller variance and tighter confidence intervals. In the third topic, we consider a formal test of the null hypothesis that all the constraints are binding, versus the alternative that at least one constraint is non-binding. The test of constant versus increasing domain means is a special case. The power of the test is substantially better than the test with an unconstrained alternative. The new test is used with data from the National Survey of College Graduates, to show that salaries are positively related to the subject's father's educational level, across fields of study and over several years of cohorts.Item Open Access Statistical modeling and inference for spatial and spatio-temporal data(Colorado State University. Libraries, 2019) Liu, Jialuo, author; Wang, Haonan, advisor; Breidt, F. Jay, committee member; Kokoszka, Piotr S., committee member; Luo, Rockey J., committee memberSpatio-temporal processes with a continuous index in space and time are encountered in many scientific disciplines such as climatology, environmental sciences, and public health. A fundamental component for modeling such spatio-temporal processes is the covariance function, which is traditionally assumed to be stationary. While convenient, this stationarity assumption can be unrealistic in many situations. In the first part of this dissertation, we develop a new class of locally stationary spatio-temporal covariance functions. A novel spatio-temporal expanding distance (STED) asymptotic framework is proposed to study the properties of statistical inference. The STED asymptotic framework is established on a fixed spatio-temporal domain, aiming to characterize spatio-temporal processes that are globally nonstationary in a rescaled fixed domain and locally stationary in a distance expanding domain. The utility of STED is illustrated by establishing the asymptotic properties of the maximum likelihood estimation for a general class of spatio-temporal covariance functions, as well as a simulation study which suggests sound finite-sample properties. Then, we address the problem of simultaneous estimation of the mean and covariance functions for continuously indexed spatio-temporal processes. A flexible spatio-temporal model with partially linear regression in the mean function and local stationarity in the covariance function is proposed. We study a profile likelihood method for estimation in the presence of spatio-temporally correlated errors. Specifically, for the nonparametric component, we employ a family of bimodal kernels to alleviate bias, which may be of independent interest for semiparametric spatial statistics. The theoretical properties of our profile likelihood estimation, including consistency and asymptotic normality, are established. A simulation study is conducted and corroborates our theoretical findings, while a health hazard data example further illustrates the methodology. Maximum likelihood method for irregularly spaced spatial datasets is computationally intensive, as it involves the manipulation of sizable dense covariance matrices. Finding the exact likelihood is generally impractical, especially for large datasets. In the third part, we present an approximation to the Gaussian log-likelihood function using Krylov subspace methods. This method reduces the computational complexity from O(N³) operations to O(N²) for dense matrices and further to quasi-linear if matrices are sparse. Specifically, we implement the conjugate gradient method to solve linear systems iteratively and use Monte Carlo method and Gauss quadrature rule to obtain a stochastic estimator of the log-determinant. We give conditions to ensure consistency of the estimators. Simulation studies have been conducted to explore various important computational aspects including complexity, accuracy and efficiency. We also apply our proposed method to estimate the spatial structure of a big LiDAR dataset.Item Open Access Statistical modeling and inferences on directed networks(Colorado State University. Libraries, 2024) Du, Wenqin, author; Zhou, Wen, advisor; Breidt, F. Jay, committee member; Meyer, Mary, committee member; Pezeshki, Ali, committee memberNetwork data has received great attention for elucidating comprehensive insights into nodes interactions and underlying network dynamics. This dissertation contributes new modeling tools and inference procedures to the field of network analysis, incorporating the dependence structure inherently introduced by the network data. Our first direction centers on modeling directed edges with count measurements, an area that has received limited attention in the literature. Most existing methods either assume the count edges are derived from continuous random variables or model the edge dependence by parametric distributions. In this dissertation, we develop a latent multiplicative Poisson model for directed network with count edges. Our approach directly models the edge dependence of count data by the pairwise dependence of latent errors, which are assumed to be weakly exchangeable. This assumption not only covers a variety of common network effects, but also leads to a concise representation of the error covariance. In addition, identification and inference of the mean structure, as well as the regression coefficients, depend on the errors only through their covariance, which provides substantial flexibility for our model. We propose a pseudo-likelihood based estimator for the regression coefficients that enjoys consistency and asymptotic normality. We evaluate our method by extensive numerical studies that corroborate the theory and apply our model to a food sharing network data to reveal interesting network effects that are further verified in literature. In the second project, we study the inference procedure of network dependence structures. While much research has targeted network-covariate associations and community detection, the inference of important network effects such as the reciprocity and sender-receiver effects has been largely overlooked. Testing network effects for network data or weighted directed networks is challenging due to the intricate potential edge dependence. Most existing methods are model-based, carrying strong assumptions with restricted applicability. In contrast, we present a novel, fully nonparametric framework that requires only minimal regularity assumptions. While inspired by recent developments in U-statistic literature, our work significantly broadens their scopes. Specifically, we identified and carefully addressed the indeterminate degeneracy inherent in network effect estimators - a challenge that aforementioned tools do not handle. We established Berry-Esseen type bound for the accuracy of type-I error rate control, as well as novel analysis show the minimax optimality of our test's power. Simulations highlight the superiority of our method in computation speed, accuracy, and numerical robustness relative to benchmarks. To showcase the practicality of our methods, we apply them to two real-world relationship networks, one in faculty hiring networks and the other in international trade networks. Finally, this dissertation introduces modeling strategies and corresponding methods for discerning the core-periphery (CP) structure in weighted directed networks. We adopt the signal-plus-noise model, categorizing uniform relational patterns as non-informative, by which we define the sender and receiver peripheries. Furthermore, instead of confining the core component to a specific structure, we consider it complementary to either the sender or receiver peripheries. Based on our definitions of the sender and receiver peripheries, we propose spectral algorithms to identify the CP structure in weighted directed networks. Our algorithm stands out with statistical guarantees, ensuring the identification of sender and receiver peripheries with overwhelmingly probability. Additionally, our methods scale effectively for expansive directed networks. We evaluate the proposed methods in extensive simulation studies and applied it to a faculty hiring network data, revealing captivating insights into the informative and non-informative sender/receiver behaviors.Item Open Access Survey estimators of domain means under shape restrictions(Colorado State University. Libraries, 2018) Oliva Avilés, Cristian M., author; Meyer, Mary C., advisor; Opsomer, Jean D., advisor; Breidt, F. Jay, committee member; Wang, Haonan, committee member; Wilson, Kenneth R., committee memberNovel methodologies that introduce shape-restricted regression techniques into survey domain estimation and inference are presented in this dissertation. Although population domain means are frequently expected to respect shape constraints that arise naturally on the survey data, their most common direct estimators often violate such restrictions, especially when the variability of these estimators is high. Recently, a monotone estimator that is obtained from adaptively pooling neighboring domains was proposed. When the monotonicity assumption on population domain means is reasonable, the monotone estimator leads to asymptotically valid estimation and inference, and can lead to substantial improvements in efficiency, in comparison with unconstrained estimators. Motivated from these convenient properties adherent to the monotone estimator, the two main questions addressed in this dissertation arise: first, since invalid monotone restrictions may lead to biased estimators, how to create a data-driven decision for whether a restriction violation on the sample occurs due to an actual violation on the population, or simply because of chance; and secondly, how the monotone estimator can be extended to a more general constrained estimator that allows for many other types of shape restrictions beyond monotonicity. In this dissertation, the Cone Information Criterion for Survey Data (CICs) is proposed to detect monotonicity departures on population domain means. The CICs is shown to lead to a consistent methodology that makes an asymptotically correct decision when choosing between unconstrained and constrained domain mean estimators. In addition, a design-based estimator of domain means that respect inequality constraints represented through irreducible matrices is presented. This constrained estimator is shown to be consistent and asymptotically normally distributed under mild conditions, given that the assumed restrictions are reasonable for the population. Further, simulation experiments demonstrate that both estimation and variability of domain means are improved by constrained estimates, in comparison with unconstrained estimates, mainly on domains with small sample sizes. These proposed methodologies are applied to analyze data from the 2011-2012 U.S. National Health and Nutrition Examination Survey and the 2015 U.S. National Survey of College Graduates. In terms of software development and outside of the survey context, the package bcgam is developed in R to fit constrained generalised additive models using a Bayesian approach. The main routines of bcgam allow users to easily specify their model of interest, and to produce numerical and graphical output. The package bcgam is now available from the Comprehensive R Archive Network.Item Open Access Topics in design-based and Bayesian inference for surveys(Colorado State University. Libraries, 2012) Hernandez-Stumpfhauser, Daniel, author; Opsomer, Jean, advisor; Breidt, F. Jay, committee member; Hoeting, Jennifer A., committee member; Kreidenweis, Sonia M., committee memberWe deal with two different topics in Statistics. The first topic in survey sampling deals with variance and variance estimation of estimators of model parameters in the design-based approach to analytical inference for survey data when sampling weights include post-sampling weight adjustments such as calibration. Under the design-based approach estimators of model parameters, if available in closed form, are written as functions of estimators of population totals and means. We examine properties of these estimators in particular their asymptotic variances and show how ignoring the post-sampling weight adjustments, i.e. treating sampling weights as inverses of inclusion probabilities, results in biased variance estimators. Two simple simulation studies for two common estimators, an estimator of a population ratio and an estimator of regression coefficients, are provided with the purpose of showing situations for which ignoring the post-sampling weight adjustments results in significant biased variance estimators. For the second topic we consider Bayesian inference for directional data using the projected normal distribution. We show how the models can be estimated using Markov chain Monte Carlo methods after the introduction of suitable latent variables. The cases of random sample, regression, model comparison and Dirichlet process mixture models are covered and motivated by a very large dataset of daily departures of anglers. The number of parameters increases with sample size and thus the need of exploring alternatives. We explore mean field variational methods and identify a number of problems in the application of the method to these models, caused by the poor approximation of the variational distribution to the posterior distribution. We propose solutions to those problems by improving the mean field variational approximation through the use of the Laplace approximation for the regression case and through the use of novel Monte Carlo procedures for the mixture model case.Item Open Access Weighting adjustments in surveys(Colorado State University. Libraries, 2017) Fu, Ran, author; Opsomer, Jean D., advisor; Breidt, F. Jay, committee member; Kokoszka, Piotr, committee member; Mushinski, David, committee memberWe consider three topics in this dissertation: 1) Nonresponse weighting adjustment using penalized spline regression; 2) Improving survey estimators through weight smoothing; and 3) An investigation of weight smoothing estimators under mixed model specifications. In the first topic, we propose a new survey estimator under nonresponse, which only assumes that the response propensity is a smooth function of a known covariate, and we estimate the propensity function by fitting a nonparametric logistic model using penalized spline regression. We obtain the linearization of the nonresponse weighting adjustment estimator with respect to the sampling design and the random response mechanism, allowing us to perform asymptotically correct inference. In a simulation study, we show that the nonparametric estimator remains competitive with a linear logistic estimator when the response propensity function follows a linear logistic model, but performs significantly better when the response propensity function is nonlinear. Beaumont (2008) proposed model-based weight smoothing as a way to improve the efficiency of survey estimators. In the second topic, we extend this work by obtaining the asymptotic properties of this approach with respect to the sampling design and the weight model. The latter is taken to be a lognormal linear regression model. We derive the asymptotic distribution of the estimator and propose a consistent estimator of the asymptotic variance. A Hájek version of the estimator is considered, as well as a replication variance estimator, both of which improve the robustness of weight smoothing against model misspecification. In the third topic, the results from the second topic are extended to models with random effects. Two versions of the estimator are proposed, depending on whether the random effects are predicted or integrated out, and their practical performance is compared through a simulation study.