Browsing by Author "Draper, Bruce, committee member"
Now showing 1 - 20 of 25
Results Per Page
Sort Options
Item Open Access Automated tropical cyclone eye detection using discriminant analysis(Colorado State University. Libraries, 2015) DeMaria, Robert, author; Anderson, Charles, advisor; Draper, Bruce, committee member; Schubert, Wayne, committee memberEye formation is often associated with rapid intensification of tropical cyclones, so this information is very valuable to hurricane forecasters. Linear and Quadratic Discriminant Analysis (LDA and QDA) were utilized to develop a method for objectively determining whether or not a tropical cyclone has an eye. The input to the algorithms included basic storm information that is routinely available to forecasters, including the maximum wind, latitude and longitude of the storm center, and the storm motion vector. Infrared imagery from geostationary satellites in a 320 km by 320 km region around each storm was also used as input. Principal Component Analysis was used to reduce the dimension of the IR dataset. The ground truth for the algorithm development was the subjective determination of whether or not a tropical cyclone had an eye made by hurricane forecasters. The input sample included 4109 cases at 6 hr intervals for Atlantic tropical cyclones from 1995 to 2013. Results showed that the LDA and QDA algorithms successfully classified about 90% of the test cases. The best algorithm used a combination of basic storm information and principal components from the IR imagery. These included the maximum winds, the storm latitude and components of the storm motion vector, and 10 PCs from eigenvectors that primarily represented the symmetric structures in the IR imagery. The QDA version performed a little better using a Peirce Skill Score, which measures the ability to correctly classify cases. The LDA and QDA algorithms also provide the probability that each case contains an eye. The LDA version performed a little better using the Brier Skill Score, which measures the utility of the class probabilities. The high success rate indicates that the algorithm can reliably reproduce what forecasters are currently doing subjectively. This algorithm would have a number of applications, including providing forecasters with an objective way to determine if a tropical cyclone has or is becoming more likely to form an eye. The probability information and its time trends could be used as input to other algorithms, such as existing operational forecast methods for estimating tropical cyclone intensity changes.Item Open Access Color memory for objects with prototypical color mismatch(Colorado State University. Libraries, 2013) Opper, Jamie K., author; Monnier, Patrick, advisor; Draper, Bruce, committee member; Rhodes, Matthew, committee memberMany studies have demonstrated the effect of top-down influences on color preference and memory, but these have primarily studied short-term memory or color memory in the abstract (e.g., the experimenter names an object or substance and the subject produces a subjective match without first being exposed to a stimulus). The present study examined the effect of object color prototypicality and how such prototypicality might influence memory for colors of objects presented in non-prototypical colors (e. g., a banana presented as blue). A match between an object's prototypical and presentation colors appeared to facilitate the accuracy of matching and increase participants' confidence that they achieved a correct match; a prototypical color mismatch impaired subjects' ability to achieve a correct match. For stimuli presented in their prototypical colors, subjects tended to remember highly saturated stimuli as less saturated, and desaturated stimuli as more saturated, indicating a sort of "regression to a saturation mean". This effect did not occur for stimuli presented in a non-prototypical color or stimuli presented as simple colored circles. Evidence was not found, however, for systematic influence of object color prototypicality on the hue and/or luminance of subjects' produced matches.Item Open Access Configural asymmetries: an effect of context and object based processes(Colorado State University. Libraries, 2010) Edler, Joshua R., author; Monnier, Patrick, advisor; Draper, Bruce, committee member; Clegg, Benjamin, committee memberConfigural asymmetries refer to differences in visual search performance in which displays composed of objects requiring left/right judgments, are slower to process and incur more errors than displays composed of objects requiring up/down judgments. Two accounts of the effect have emerged in the literature. The Object Region Account, an object-based explanation, posits configural asymmetries are driven by differences in processing the up/down versus left/right regions of an individual object, with left/right regions being less finely processed. The Inter-item Symmetry account, a context-based explanation, posits configural asymmetries are due to mirror symmetry relationships shared between multiple elements in the search display. Specifically, objects sharing vertical mirror symmetry are perceived as more similar and therefore harder to process than objects sharing horizontal mirror symmetry. This study attempted to test and separate these two accounts. Measurements demonstrated that mirror symmetry relationships alone between target and distractors indeed produced an asymmetry in search performance--horizontal mirror symmetry was easier to search through than vertical mirror symmetry. Albeit the magnitude of the effect produced solely by mirror symmetry was noticeably smaller than the effect obtained when objects required left/right versus up/down comparisons (e.g., Monnier, Atarha, Edler, & Birks, 2010; Van Zoest, Giesbrecht, Enns, Kingstone, 2006). Furthermore, when mirror symmetry was held between distractors the reverse effect was found - vertical mirror symmetry was easier to search through than horizontal mirror symmetry. These measurements support configural asymmetries are best understood as an interaction of both object-based and context-based processes and provide support that mirror symmetry is a dimension by which the visual system groups objects.Item Open Access Controlled and automatic processing in implicit learning(Colorado State University. Libraries, 2012) Mong, Heather Marie Skeba, author; Seger, Carol, advisor; DeLosh, Ed, committee member; Volbrecht, Vicki, committee member; Draper, Bruce, committee memberThis dissertation proposes a new approach for measuring the cognitive outcomes of learning from implicit tasks: measure the controlled and automatic processes at use by participants after training, and focus on how controllable the acquired knowledge is under different learning conditions as measured through a process-dissociation procedure. This avoids the uncertainty of any explicit knowledge test's ability to exhaustively measure the contents of consciousness, and provides a different way to view the cognitive changes due to implicit task training. This dissertation includes three experiments using two different implicit learning tasks (serial response reaction time [SRTT] and contextual cuing) to test how controllable the knowledge gained from these tasks is. The first two experiments used the SRTT, in which participants have to make the appropriate corresponding spatial response when presented with a visual stimulus in one of four locations. The trained information is a repeating 12-item response series, which participants are not typically told is repeating. These experiments found use of both controlled and automatic processes by participants. When participants were cued that a sequence was repeating (Experiment 2), there was significantly less use of controlled processes than when participants were not cued into the sequence repetition, suggesting a shift away from controlled processes when explicitly learning the repeating information. The third experiment used the contextual cuing visual search task, which requires participants to rapidly locate a target (T) in a field of distracters (L). Participants become faster at locating the target within repeating spatial configurations across training. Experiment 3 also found use of both controlled and automatic processes after training. However, cuing the repetition did not change either controlled or automatic process estimates, suggesting that control over acquired knowledge is not affected by intent to learn. Altogether, the process dissociation approach provides process estimates congruent with existing theoretical explanations of the two implicit learning tasks, and are a useful addition to the techniques available to study implicit learning.Item Open Access Cracking open the black box: a geometric and topological analysis of neural networks(Colorado State University. Libraries, 2024) Cole, Christina, author; Kirby, Michael, advisor; Peterson, Chris, advisor; Cheney, Margaret, committee member; Draper, Bruce, committee memberDeep learning is a subfield of machine learning that has exploded in recent years in terms of publications and commercial consumption. Despite their increasing prevalence in performing high-risk tasks, deep learning algorithms have outpaced our understanding of them. In this work, we hone in on neural networks, the backbone of deep learning, and reduce them to their scaffolding defined by polyhedral decompositions. With these decompositions explicitly defined for low-dimensional examples, we utilize novel visualization techniques to build a geometric and topological understanding of them. From there, we develop methods of implicitly accessing neural networks' polyhedral skeletons, which provide substantial computational and memory savings compared to those requiring explicit access. While much of the related work using neural network polyhedral decompositions is limited to toy models and datasets, the savings provided by our method allow us to use state-of-the-art neural networks and datasets in our analyses. Our experiments alone demonstrate the viability of a polyhedral view of neural networks and our results show its usefulness. More specifically, we show that the geometry that a polyhedral decomposition imposes on its neural network's domain contains signals that distinguish between original and adversarial images. We conclude our work with suggested future directions. Therefore, we (1) contribute toward closing the gap between our use of neural networks and our understanding of them through geometric and topological analyses and (2) outline avenues for extensions upon this work.Item Open Access Disambiguating ambiguity: influence of various levels of uncertainty on neural systems mediating choice(Colorado State University. Libraries, 2011) Lopez Paniagua, Dan, author; Seger, Carol, advisor; Cleary, Anne, committee member; Draper, Bruce, committee member; Troup, Lucy, committee memberPrevious studies have dissociated two types of uncertainty in decision making: risk and ambiguity. However, many of these studies have categorically defined ambiguity as a complete lack of information regarding outcome probabilities, thereby precluding the study of how various neural substrates may acknowledge and track levels of ambiguity. The present study provided a novel paradigm designed to address how decisions are made under varying states of uncertainty, ranging from risk to ambiguity. More important, the present study was designed to address limitations of previous studies looking at decision making under uncertainty: explore neural regions sensitive to hidden but searchable information by parametrically controlling the amount of information hidden from the subject by using different levels of ambiguity manipulations instead of just the one, as used in previous studies, and allowed subjects to freely choose the best option. Participants were asked to play one of two lotteries, one uncertain and one certain. Throughout the task, the certain lottery offered to participants was always a 100% chance of winning $1. This was contrasted by the uncertain lottery in which various probabilities of winning (20%, 33%, 50 % or 80%) were combined with different potential gains (2$, 3$, 5$, or 8$) so that expected values ranged from being better, equal or worse than the expected value of the certain lottery. In our lotteries, the probability of winning or losing any given amount of money was indicated along the borders of the wheel, increasing from 0% to 100% in a clockwise direction starting at the 12 o'clock position. For some uncertain lotteries and all certain lotteries, a "dial" explicitly indicated the probability of winning. For some uncertain lotteries, there was no dial to indicate a specific probability. Instead, a blinder that covered a portion of the wheel occluded the dial. This occlusion represented the possible range of percentages in which the actual probability of winning lay. Finally, the blinder covered 15%, 33%, 66%, 80% or 100% of the wheel in order to vary the level of ambiguity. By manipulating the level of ambiguity, we were able to explore neural responses to different types of uncertainty ranging from risk to full ambiguity. Participants completed this task while BOLD contrast images were collected using a 3T MR scanner. Here, we show that both risk and ambiguity share a common network devoted to uncertainty processing in general. Moreover, we found support for the hypothesis that regions of the DLPFC might subserve contextual analysis when search of hidden information is both necessary and meaningful in order to optimize behavior in a decision making task; activation in the DLPFC peaked when the degraded information could be resolved by additional cognitive processing. Our results help to underscore the importance of studying varying degrees of uncertainty, as we found evidence for different neural responses for intermediate and high levels of ambiguity that are easy to ignore depending on how ambiguity is defined. Additionally, our results help reconcile two different accounts of brain activity during ambiguous decision making, one suggesting that uncertainty increases linearly and another suggesting ambiguity processing is greater at intermediate levels. The graded coding of uncertainty we reported may reflect a unified neural treatment of risk and ambiguity as limiting cases of a general system evaluating uncertainty mediated by the DLPFC which then recruits different regions of the prefrontal cortex as well as other valuation and learning systems according to the inherent difficulty of a decision.Item Open Access EEG subspace analysis and classification using principal angles for brain-computer interfaces(Colorado State University. Libraries, 2015) Ashari, Rehab Bahaaddin, author; Anderson, Charles W., advisor; Ben-Hur, Asa, committee member; Draper, Bruce, committee member; Peterson, Chris, committee memberBrain-Computer Interfaces (BCIs) help paralyzed people who have lost some or all of their ability to communicate and control the outside environment from loss of voluntary muscle control. Most BCIs are based on the classification of multichannel electroencephalography (EEG) signals recorded from users as they respond to external stimuli or perform various mental activities. The classification process is fraught with difficulties caused by electrical noise, signal artifacts, and nonstationarity. One approach to reducing the effects of similar difficulties in other domains is the use of principal angles between subspaces, which has been applied mostly to video sequences. This dissertation studies and examines different ideas using principal angles and subspaces concepts. It introduces a novel mathematical approach for comparing sets of EEG signals for use in new BCI technology. The success of the presented results show that principal angles are also a useful approach to the classification of EEG signals that are recorded during a BCI typing application. In this application, the appearance of a subject's desired letter is detected by identifying a P300-wave within a one-second window of EEG following the flash of a letter. Smoothing the signals before using them is the only preprocessing step that was implemented in this study. The smoothing process based on minimizing the second derivative in time is implemented to increase the classification accuracy instead of using the bandpass filter that relies on assumptions on the frequency content of EEG. This study examines four different ways of removing outliers that are based on the principal angles and shows that the outlier removal methods did not help in the presented situations. One of the concepts that this dissertation focused on is the effect of the number of trials on the classification accuracies. The achievement of the good classification results by using a small number of trials starting from two trials only, should make this approach more appropriate for online BCI applications. In order to understand and test how EEG signals are different from one subject to another, different users are tested in this dissertation, some with motor impairments. Furthermore, the concept of transferring information between subjects is examined by training the approach on one subject and testing it on the other subject using the training subject's EEG subspaces to classify the testing subject's trials.Item Open Access Effects of in-group bias on face recognition using minimal group procedures(Colorado State University. Libraries, 2014) Nguyen, Maia T., author; Troup, Lucy J., advisor; Draper, Bruce, committee member; Rhodes, Matthew G., committee memberThe current series of experiments examined the effects of social categorization on face recognition. The use of minimal group procedures was expected to enhance recognition for in-group members compared to out-group members. In Experiment 1, participants were assigned to 1 of 3 conditions: name study--participants studied a list of 16 names associated with their in-group [red or green], numerical estimation--participants were randomly divided into 2 groups [red or green] after estimating the number of dots in a series of 10 images, and the control condition. This was followed by a study phase in which participants were presented with a total of 32 female and male Caucasian faces on red or green backgrounds. A final recognition test was given following a filler task. Experiment 2 had two of the previously used conditions, name study and control. Faces were presented on red and green backgrounds during test--with old faces presented on the same background as seen at study. Experiment 3 presented a subset of stimuli used in Experiment 2 with a longer presentation time (10 seconds). Findings suggest only moderate difference in response bias between experimental and control groups overall in Experiments 2 and 3. Moderate differences in hits, false alarms, and d' were also found in Experiment 3 between experimental conditions. Group membership did not elicit significant effects on measures of accuracy, reaction time, and confidence ratings.Item Open Access Evaluating soft biometrics in the context of face recognition(Colorado State University. Libraries, 2013) Zhang, Hao, author; Beveridge, Ross, advisor; Draper, Bruce, committee member; Givens, Geof, committee memberSoft biometrics typically refer to attributes of people such as their gender, the shape of their head, the color of their hair, etc. There is growing interest in soft biometrics as a means of improving automated face recognition since they hold the promise of significantly reducing recognition errors, in part by ruling out illogical choices. Here four experiments quantify performance gains on a difficult face recognition task when standard face recognition algorithms are augmented using information associated with soft biometrics. These experiments include a best-case analysis using perfect knowledge of gender and race, support vector machine-based soft biometric classifiers, face shape expressed through an active shape model, and finally appearance information from the image region directly surrounding the face. All four experiments indicate small improvements may be made when soft biometrics augment an existing algorithm. However, in all cases, the gains were modest. In the context of face recognition, empirical evidence suggests that significant gains using soft biometrics are hard to come by.Item Open Access Evaluating the performance of iPhoto facial recognition at the biometric verification task(Colorado State University. Libraries, 2012) Patmore, Keegan P., author; Beveridge, Ross, advisor; Draper, Bruce, committee member; Givens, Geoff, committee memberThe Faces feature of Apple's iPhoto '09 software uses facial recognition techniques to help people organize their digital photographs. This work seeks to measure the facial recognition performance of iPhoto Faces in order to gain insight into the progress of facial recognition systems in commercial software. A common performance evaluation protocol is explained and performance values are presented. The protocol is based on performance measurements of academic and biometric facial recognition systems performed at the National Institute of Standards and Technology. It uses the data set developed for the Good, the Bad, & the Ugly Face Recognition Challenge Problem which contains images with varying levels of facial recognition difficulty. Results show high performance on the hardest faces to recognize, less than peak performance on the easier faces, and overall less variation in performance across varying levels of difficulty than is observed for alternative baseline algorithms.Item Open Access Evaluating the role of context in 3D theater stage reconstruction(Colorado State University. Libraries, 2014) D'Souza, Wimroy, author; Beveridge, J. Ross, advisor; Draper, Bruce, committee member; Luo, J. Rockey, committee memberRecovering the 3D structure from 2D images is a problem dating back to the 1960s. It is only recently, with the advancement of computing technology, that there has been substantial progress in solving this problem. In this thesis, we focus on one method for recovering scene structure given a single image. This method uses supervised learning techniques and a multiple-segmentation framework for adding contextual information to the inference. We evaluate the effect of this added contextual information by excluding this additional information to measure system performance. We then go on to evaluate the effect of the other system components that remain which include classifiers and image features. For example, in the case of classifiers, we substitute the original with others to see the level of accuracy that these provide. In the case of the features, we conduct experiments that give us the most important features that contribute to classification accuracy. All of this put together lets us evaluate the effect of adding contextual information to the learning process and if it can be improved by improving the other non-contextual components of the system.Item Open Access Exploiting geometry, topology, and optimization for knowledge discovery in big data(Colorado State University. Libraries, 2013) Ziegelmeier, Lori Beth, author; Kirby, Michael, advisor; Peterson, Chris, advisor; Liu, Jiangguo (James), committee member; Draper, Bruce, committee memberIn this dissertation, we consider several topics that are united by the theme of topological and geometric data analysis. First, we consider an application in landscape ecology using a well-known vector quantization algorithm to characterize and segment the color content of natural imagery. Color information in an image may be viewed naturally as clusters of pixels with similar attributes. The inherent structure and distribution of these clusters serves to quantize the information in the image and provides a basis for classification. A friendly graphical user interface called Biological Landscape Organizer and Semi-supervised Segmenting Machine (BLOSSM) was developed to aid in this classification. We consider four different choices for color space and five different metrics in which to analyze our data, and results are compared. Second, we present a novel topologically driven clustering algorithm that blends Locally Linear Embedding (LLE) and vector quantization by mapping color information to a lower dimensional space, identifying distinct color regions, and classifying pixels together based on both a proximity measure and color content. It is observed that these techniques permit a significant reduction in color resolution while maintaining the visually important features of images. Third, we develop a novel algorithm which we call Sparse LLE that leads to sparse representations in local reconstructions by using a data weighted 1-norm regularization term in the objective function of an optimization problem. It is observed that this new formulation has proven effective at automatically determining an appropriate number of nearest neighbors for each data point. We explore various optimization techniques, namely Primal Dual Interior Point algorithms, to solve this problem, comparing the computational complexity for each. Fourth, we present a novel algorithm that can be used to determine the boundary of a data set, or the vertices of a convex hull encasing a point cloud of data, in any dimension by solving a quadratic optimization problem. In this problem, each point is written as a linear combination of its nearest neighbors where the coefficients of this linear combination are penalized if they do not construct a convex combination, revealing those points that cannot be represented in this way, the vertices of the convex hull containing the data. Finally, we exploit the relatively new tool from topological data analysis, persistent homology, and consider the use of vector bundles to re-embed data in order to improve the topological signal of a data set by embedding points sampled from a projective variety into successive Grassmannians.Item Open Access Exploring user-defined gestures for alternate interaction space for smartphones and smartwatches(Colorado State University. Libraries, 2016) Arefin Shimon, Shaikh Shawon, author; Ruiz, Jaime, advisor; Draper, Bruce, committee member; Montgomery, Tai, committee memberIn smartphones and smartwatches, the input space is limited due to their small form factor. Although many studies have highlighted the possibility of expanding the interaction space for these devices, limited work has been conducted on exploring end-user preferences for gestures in the proposed interaction spaces. In this dissertation, I present the results of two elicitation studies that explore end-user preferences for creating gestures in the proposed alternate interaction spaces for smartphones and smartwatches. Using the data collected from the two elicitation studies, I present gestures preferred by end-users for common tasks that can be performed using smartphones and smartwatches. I also present the end-user mental models for interaction in proposed interaction spaces for these devices, and highlight common user motivations and preferences for suggested gestures. Based on the findings, I present design implications for incorporating the proposed alternate interaction spaces for smartphones and smartwatches.Item Open Access Garbage elimination in SA-C host code(Colorado State University. Libraries, 2001) Segreto, Steve, author; Bohm, Wim, advisor; Draper, Bruce, committee memberSingle-assignment C (SA-C) is a functional programming language with a rich instruction set designed to create and manipulate arrays using array slices and window generators. It is well-suited for the fields of graphics, AI and image processing within reconfigurable computing environments. Garbage is defined as any SA-C array data which is unused or unreferenced in the host code program heap at any time. Garbage must not be created and it must be freed as soon as possible. In this paper it will be shown that the single-assignment properties of the language create garbage when single-assignment occurs in loops. This behavior is studied and a static solution is presented called pointer reuse. The non-circular aliases resulting from strict single-assignment alias creation coupled with the side-effect free nature of statement blocks lead to a dynamic reference counting technique which can provide immediate elimination of garbage. Aliases and special loop-carried variable dependencies complicate matters further and are examined in this paper.Item Open Access Grassmann, Flag, and Schubert varieties in applications(Colorado State University. Libraries, 2017) Marrinan, Timothy P., author; Kirby, Michael, advisor; Peterson, Chris, advisor; Azimi-Sadjadi, Mahmood R., committee member; Bates, Dan, committee member; Draper, Bruce, committee memberThis dissertation develops mathematical tools for signal processing and pattern recognition tasks where data with the same identity is assumed to vary linearly. We build on the growing canon of techniques for analyzing and optimizing over data on Grassmann manifolds. Specifically we expand on a recently developed method referred to as the flag mean that finds an average representation for a collection data that consists of linear subspaces of possibly different dimensions. When prior knowledge exists about relationships between these data, we show that a point analogous to the flag mean can be found as an element of a Schubert variety to incorporates this theoretical information. This domain restriction relates closely to a recent result regarding point-to-set functions. This restricted average along with a property of the flag mean that prioritizes weak but common information, leads to practical applications of the flag mean such as chemical plume detection in long-wave infrared hyperspectral videos, and a modification of the well-known diffusion map for adaptively visualizing data relationships in 2-dimensions.Item Open Access Large margin methods for partner specific prediction of interfaces in protein complexes(Colorado State University. Libraries, 2014) Minhas, Fayyaz ul Amir Afsar, author; Ben-Hur, Asa, advisor; Draper, Bruce, committee member; Anderson, Charles, committee member; Snow, Christopher, committee memberThe study of protein interfaces and binding sites is a very important domain of research in bioinformatics. Information about the interfaces between proteins can be used not only in understanding protein function but can also be directly employed in drug design and protein engineering. However, the experimental determination of protein interfaces is cumbersome, expensive and not possible in some cases with today's technology. As a consequence, the computational prediction of protein interfaces from sequence and structure has emerged as a very active research area. A number of machine learning based techniques have been proposed for the solution to this problem. However, the prediction accuracy of most such schemes is very low. In this dissertation we present large-margin classification approaches that have been designed to directly model different aspects of protein complex formation as well as the characteristics of available data. Most existing machine learning techniques for this task are partner-independent in nature, i.e., they ignore the fact that the binding propensity of a protein to bind to another protein is dependent upon characteristics of residues in both proteins. We have developed a pairwise support vector machine classifier called PAIRpred to predict protein interfaces in a partner-specific fashion. Due to its more detailed model of the problem, PAIRpred offers state of the art accuracy in predicting both binding sites at the protein level as well as inter-protein residue contacts at the complex level. PAIRpred uses sequence and structure conservation, local structural similarity and surface geometry, residue solvent exposure and template based features derived from the unbound structures of proteins forming a protein complex. We have investigated the impact of explicitly modeling the inter-dependencies between residues that are imposed by the overall structure of a protein during the formation of a protein complex through transductive and semi-supervised learning models. We also present a novel multiple instance learning scheme called MI-1 that explicitly models imprecision in sequence-level annotations of binding sites in proteins that bind calmodulin to achieve state of the art prediction accuracy for this task.Item Open Access Large-scale automated protein function prediction(Colorado State University. Libraries, 2016) Kahanda, Indika, author; Ben-Hur, Asa, advisor; Anderson, Chuck, committee member; Draper, Bruce, committee member; Zhou, Wen, committee memberProteins are the workhorses of life, and identifying their functions is a very important biological problem. The function of a protein can be loosely defined as everything it performs or happens to it. The Gene Ontology (GO) is a structured vocabulary which captures protein function in a hierarchical manner and contains thousands of terms. Through various wet-lab experiments over the years scientists have been able to annotate a large number of proteins with GO categories which reflect their functionality. However, experimentally determining protein functions is a highly resource-intensive task, and a large fraction of proteins remain un-annotated. Recently a plethora automated methods have emerged and their reasonable success in computationally determining the functions of proteins using a variety of data sources – by sequence/structure similarity or using various biological network data, has led to establishing automated function prediction (AFP) as an important problem in bioinformatics. In a typical machine learning problem, cross-validation is the protocol of choice for evaluating the accuracy of a classifier. But, due to the process of accumulation of annotations over time, we identify the AFP as a combination of two sub-tasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In our first project, we analyze the performance of several protein function prediction methods in these two scenarios. Our results show that GOstruct, an AFP method that our lab has previously developed, and two other popular methods: binary SVMs and guilt by association, find it hard to achieve the same level of accuracy on these two tasks compared to the performance evaluated through cross-validation, and that predicting novel annotations for previously annotated proteins is a harder problem than predicting annotations for uncharacterized proteins. We develop GOstruct 2.0 by proposing improvements which allows the model to make use of information of a protein's current annotations to better handle the task of predicting novel annotations for previously annotated proteins. Experimental results on yeast and human data show that GOstruct 2.0 outperforms the original GOstruct, demonstrating the effectiveness of the proposed improvements. Although the biomedical literature is a very informative resource for identifying protein function, most AFP methods do not take advantage of the large amount of information contained in it. In our second project, we conduct the first ever comprehensive evaluation on the effectiveness of literature data for AFP. Specifically, we extract co-mentions of protein-GO term pairs and bag-of-words features from the literature and explore their effectiveness in predicting protein function. Our results show that literature features are very informative of protein function but with further room for improvement. In order to improve the quality of automatically extracted co-mentions, we formulate the classification of co-mentions as a supervised learning problem and propose a novel method based on graph kernels. Experimental results indicate the feasibility of using this co-mention classifier as a complementary method that aids the bio-curators who are responsible for maintaining databases such as Gene Ontology. This is the first study of the problem of protein-function relation extraction from biomedical text. The recently developed human phenotype ontology (HPO), which is very similar to GO, is a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In our third project, we introduce PHENOstruct, a computational method that directly predicts the set of HPO terms for a given gene. We compare PHENOstruct with several baseline methods and show that it outperforms them in every respect. Furthermore, we highlight a collection of informative data sources suitable for the problem of predicting gene-HPO associations, including large scale literature mining data.Item Open Access Low-latency, query-driven analytics over voluminous multidimensional, spatiotemporal datasets(Colorado State University. Libraries, 2017) Malensek, Matthew, author; Pallickara, Shrideep, advisor; Pallickara, Sangmi Lee, advisor; Bohm, A. P. Willem, committee member; Draper, Bruce, committee member; Breidt, F. Jay, committee memberUbiquitous data collection from sources such as remote sensing equipment, networked observational devices, location-based services, and sales tracking has led to the accumulation of voluminous datasets; IDC projects that by 2020 we will generate 40 zettabytes of data per year, while Gartner and ABI estimate 20-35 billion new devices will be connected to the Internet in the same time frame. The storage and processing requirements of these datasets far exceed the capabilities of modern computing hardware, which has led to the development of distributed storage frameworks that can scale out by assimilating more computing resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of study: extracting knowledge, insights, and relationships from the underlying datasets. The basic building block of this knowledge discovery process is analytic queries, encompassing both query instrumentation and evaluation. This dissertation is centered around query-driven exploratory and predictive analytics over voluminous, multidimensional datasets. Both of these types of analysis represent a higher-level abstraction over classical query models; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset (including time series and geospatial aspects), and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. This requires specialized data structures and partitioning algorithms, along with adaptive reductions in the search space and management of the inherent trade-off between timeliness and accuracy. The algorithms presented in this dissertation were evaluated empirically on real-world geospatial time-series datasets in a production environment, and are broadly applicable across other storage frameworks.Item Open Access Object and action detection methods using MOSSE filters(Colorado State University. Libraries, 2012) Arn, Robert T., author; Kirby, Michael, advisor; Peterson, Chris, advisor; Draper, Bruce, committee memberIn this thesis we explore the application of the Minimum Output Sum of Squared Error (MOSSE) filter to object detection in images as well as action detection in video. We exploit the properties of the Fourier transform for computing correlations in two and three dimensions. We perform a comprehensive examination of the shape parameters of the desired target response and determine values to optimize the filter performance for specific objects and actions. In addition, we propose the Gaussian Iterative Response (GIR) algorithm and the Multi-Sigma Geometric Mean method to improve the MOSSE filter response on test signals. Also, new detection criteria are investigated and shown to boost the detection accuracy on two well-known data sets.Item Open Access On the evaluation of exact-match and range queries over multidimensional data in distributed hash tables(Colorado State University. Libraries, 2012) Malensek, Matthew, author; Pallickara, Shrideep, advisor; Draper, Bruce, committee member; Randall, David, committee memberThe quantity and precision of geospatial and time series observational data being collected has increased alongside the steady expansion of processing and storage capabilities in modern computing hardware. The storage requirements for this information are vastly greater than the capabilities of a single computer, and are primarily met in a distributed manner. However, distributed solutions often impose strict constraints on retrieval semantics. In this thesis, we investigate the factors that influence storage and retrieval operations on large datasets in a cloud setting, and propose a lightweight data partitioning and indexing scheme to facilitate these operations. Our solution provides expressive retrieval support through range-based and exact-match queries and can be applied over massive quantities of multidimensional data. We provide benchmarks to illustrate the relative advantage of using our solution over a general-purpose cloud storage engine in a distributed network of heterogeneous computing resources.