Browsing by Author "Anderson, Charles W., advisor"
Now showing 1 - 16 of 16
- Results Per Page
- Sort Options
Item Open Access A comparison of tri-polar concentric ring electrodes to disc electrodes for decoding real and imaginary finger movements(Colorado State University. Libraries, 2019) Alzahrani, Saleh Ibrahim, author; Anderson, Charles W., advisor; Vigh, Jozsef, committee member; Rojas, Don, committee member; Abdel-Ghany, Salah, committee memberThe electroencephalogram (EEG) is broadly used for diagnosis of brain diseases and research of brain activities. Although the EEG provides a good temporal resolution, it suffers from poor spatial resolution due to the blurring effects of volume conduction and signal-to-noise ratio. Many efforts have been devoted to the development of novel methods that can increase the EEG spatial resolution. The surface Laplacian, which is the second derivative of the surface potential, has been applied to EEG to improve the spatial resolution. Tri-polar concentric ring electrodes (TCREs) have been shown to estimate the surface Laplacian automatically with better spatial resolution than conventional disc electrodes. The aim of this research is to study how well the TCREs can be used to acquire EEG signals to decode real and imaginary finger movements. These EEG signals will be then translated into finger movements commands. We also compare the feasibility of discriminating finger movements from one hand using EEG recorded from TCREs and conventional disc electrodes. Furthermore, we evaluated two movement-related features, temporal EEG data and spectral features, in discriminating individual finger from one hand using non-invasive EEG. To do so, movement-related potentials (MRPs) are measured and analyzed from four TCREs and conventional disc electrodes while 13 subjects performed either motor execution or motor imagery of individual finger movements. The tri-polar-EEG (tEEG) and conventional EEG (cEEG) were recorded from electrodes placed according to the 10-20 International Electrode Positioning System over the motor cortex. Our results show that the TCREs achieved higher spatial resolution than conventional disc electrodes. Moreover, the results show that signals from TCREs generated higher decoding accuracy compared to signals from conventional disc electrodes. The average decoding accuracy of five-class classification for all subjects was of 70.04 ± 7.68% when we used temporal EEG data as feature and classified it using Artificial Neural Networks (ANNs) classifier. In addition, the results show that the TCRE EEG (tEEG) provides approximately a four times enhancement in the signal-to-noise ratio (SNR) compared to disc electrode signals. We also evaluated the interdependency level between neighboring electrodes from tri-polar, disc, and disc with Hjorth's Laplacian method in time and frequency domains by calculating the mutual information (MI) and coherence. The MRP signals recorded with the TCRE system have significantly less mutual information (MI) between electrodes than the conventional disc electrode system and disc electrodes with Hjorth's Laplacian method. Also, the results show that the mean coherence between neighboring tri-polar electrodes was found to be significantly smaller than disc electrode and disc electrode with Hjorth's method, especially at higher frequencies. This lower coherence in the high frequency band between neighboring tri polar electrodes suggests that the TCREs may record a more localized neuronal activity. The successful decoding of finger movements can provide extra degrees of freedom to drive brain computer interface (BCI) applications, especially for neurorehabilitation.Item Open Access An echo state model of non-Markovian reinforcement learning(Colorado State University. Libraries, 2008) Bush, Keith A., author; Anderson, Charles W., advisor; Draper, Bruce A. (Bruce Austin), 1962-, committee member; Kirby, Michael, 1961-, committee member; Young, Peter M., committee memberThere exists a growing need for intelligent, autonomous control strategies that operate in real-world domains. Theoretically the state-action space must exhibit the Markov property in order for reinforcement learning to be applicable. Empirical evidence, however, suggests that reinforcement learning also applies to domains where the state-action space is approximately Markovian, a requirement for the overwhelming majority of real-world domains. These domains, termed non-Markovian reinforcement learning domains, raise a unique set of practical challenges. The reconstruction dimension required to approximate a Markovian state-space is unknown a priori and can potentially be large. Further, spatial complexity of local function approximation of the reinforcement learning domain grows exponentially with the reconstruction dimension. Parameterized dynamic systems alleviate both embedding length and state-space dimensionality concerns by reconstructing an approximate Markovian state-space via a compact, recurrent representation. Yet this representation extracts a cost; modeling reinforcement learning domains via adaptive, parameterized dynamic systems is characterized by instability, slow-convergence, and high computational or spatial training complexity. The objectives of this research are to demonstrate a stable, convergent, accurate, and scalable model of non-Markovian reinforcement learning domains. These objectives are fulfilled via fixed point analysis of the dynamics underlying the reinforcement learning domain and the Echo State Network, a class of parameterized dynamic system. Understanding models of non-Markovian reinforcement learning domains requires understanding the interactions between learning domains and their models. Fixed point analysis of the Mountain Car Problem reinforcement learning domain, for both local and nonlocal function approximations, suggests a close relationship between the locality of the approximation and the number and severity of bifurcations of the fixed point structure. This research suggests the likely cause of this relationship: reinforcement learning domains exist within a dynamic feature space in which trajectories are analogous to states. The fixed point structure maps dynamic space onto state-space. This explanation suggests two testable hypotheses. Reinforcement learning is sensitive to state-space locality because states cluster as trajectories in time rather than space. Second, models using trajectory-based features should exhibit good modeling performance and few changes in fixed point structure. Analysis of performance of lookup table, feedforward neural network, and Echo State Network (ESN) on the Mountain Car Problem reinforcement learning domain confirm these hypotheses. The ESN is a large, sparse, randomly-generated, unadapted recurrent neural network, which adapts a linear projection of the target domain onto the hidden layer. ESN modeling results on reinforcement learning domains show it achieves performance comparable to lookup table and neural network architectures on the Mountain Car Problem with minimal changes to fixed point structure. Also, the ESN achieves lookup table caliber performance when modeling Acrobot, a four-dimensional control problem, but is less successful modeling the lower dimensional Modified Mountain Car Problem. These performance discrepancies are attributed to the ESN’s excellent ability to represent complex short term dynamics, and its inability to consolidate long temporal dependencies into a static memory. Without memory consolidation, reinforcement learning domains exhibiting attractors with multiple dynamic scales are unlikely to be well-modeled via ESN. To mediate this problem, a simple ESN memory consolidation method is presented and tested for stationary dynamic systems. These results indicate the potential to improve modeling performance in reinforcement learning domains via memory consolidation.Item Open Access Classification of P300 from non-invasive EEG signal using convolutional neural network(Colorado State University. Libraries, 2022) Farhat, Nazia, author; Anderson, Charles W., advisor; Kirby, Michael, committee member; Blanchard, Nathaniel, committee memberBrain-Computer Interface system is a communication tool for the patients of neuromuscular diseases. The efficiency of such a system largely depends on the accurate and reliable detection of the brain signal employed in its operation. P300 Speller, a well-known BCI system, which helps the user select the desired alphabet in the communication process uses an Electroencephalography signal called P300 brain wave. The spatiotemporal nature and the low Signal-to-noise ratio along with the high dimensionality of P300 signal imposes difficulties in its accurate recognition. Moreover, its inter- and intra-subject variability necessitates case-specific experimental setup requiring considerable amount of time and resources before the system's deployment for use. In this thesis Convolutional Neural Network is applied to detect the P300 signal and observe the distinguishing features of P300 and non-P300 signals extracted by the neural network. Three different shapes of the filters, namely 1-D CNN, 2-D CNN, and 3-D CNN are examined separately to evaluate their detection ability of the target signals. Virtual channels created with three different weighting techniques are explored in 3-D CNN analysis. Both within-subject and cross-subject examinations are performed. Single trial accuracy with CNN implementation. Higher single trial accuracy is observed for all the subjects with CNN implementation compared to that achieved with Stepwise Linear Discriminant Analysis. Up to approximately 80% within-subject accuracy and 64% cross- subject accuracy are recorded in this research. 1-D CNN outperforms all the other models in terms of classification accuracy.Item Open Access Classification using out of sample testing of neural networks and Siamese-like neural network for handwritten characters(Colorado State University. Libraries, 2020) Yeluri, Sri Sagar Abhishek, author; Anderson, Charles W., advisor; Beveridge, Ross, committee member; Hess, Ann, committee memberIn a world where Machine Learning Algorithms in the field of Image Processing is being developed at a rapid pace, a developer needs to have a better insight into all the algorithms to choose one among them for their application. When an algorithm is published, the developers of the algorithm compare their algorithm with already available well-performing algorithms and claim their algorithm outperforms all or the majority of other algorithms in terms of accuracy. However, adaptability is a very important aspect of Machine Learning which is usually not mentioned in their papers. Adaptability is the ability of a Machine Learning algorithm to work reliably in the real world, despite the change in the environmental factors in comparison to the environment in which data used for training is recorded. A machine learning algorithm that can give good results only on the dataset has no practical applications. In real life, the application of the algorithm increases only when it is more adaptable in nature. A few other aspects that are important in choosing the right algorithm for an application are consistency, time and resource utilization and the availability of human intervention. A person choosing amongst a list of algorithms for an application will be able to make a wise decision if given additional information, as each application varies from one another and needs a different set of characteristics of an algorithm for it to be well received. We have implemented and compared three Machine Learning algorithms used in image processing, on two different datasets and compare the results. We observe that certain algorithms, even though better than others in terms of accuracy on paper, fall behind when tested in real-world datasets. We put forward a few suggestions that if followed will simplify the selection of an algorithm for a specific purpose.Item Open Access Dimensionality reduction and classification of time embedded EEG signals(Colorado State University. Libraries, 2007) Teli, Mohammad Nayeem, author; Anderson, Charles W., advisor; McConnell, Ross, committee member; Kirby, Michael, 1961-, committee memberElectroencephalogram (EEG) is the measurement of the electrical activity of the brain measured by placing electrodes on the scalp. These EEG signals give the micro-voltage difference between different parts of the brain in a non-invasive manner. The brain activity measured in this way is being currently analyzed for a possible diagnosis of physiological and psychiatric diseases. These signals have also found a way into cognitive research. At Colorado State University we are trying to investigate the use of EEG as computer input. In this particular research our goal is to classify two mental tasks. A subject is asked to think about a mental task and the EEG signals are measured using six electrodes on his scalp. In order to differentiate between two different tasks, the EEG signals produced by each task need to be classified. We hypothesize that a bottleneck neural network would help us to classify EEG data much better than classification techniques like Linear Discriminant Analysis(LDA), Quadratic Discriminant Analysis (QDA), and Support Vector Machines. A five layer bottleneck neural network is trained using a fast convergence algorithm (variation of Levenberg-Marquardt algorithm) and Scaled Conjugate Gradient (SCG). Classification is compared between a neural network, LDA, QDA and SVM for both raw EEG data as well as bottleneck layer output. Results indicate that QDA and SVM do better classification of raw EEG data without a bottleneck network. QDA and SVM always achieved higher classification accuracy than the neural network with a bottleneck layer in all our experiments. Neural network was able to achieve its best classification accuracy of 92% of test samples correctly classified, whereas QDA achieved 100% accuracy in classifying the test data.Item Open Access EEG subspace analysis and classification using principal angles for brain-computer interfaces(Colorado State University. Libraries, 2015) Ashari, Rehab Bahaaddin, author; Anderson, Charles W., advisor; Ben-Hur, Asa, committee member; Draper, Bruce, committee member; Peterson, Chris, committee memberBrain-Computer Interfaces (BCIs) help paralyzed people who have lost some or all of their ability to communicate and control the outside environment from loss of voluntary muscle control. Most BCIs are based on the classification of multichannel electroencephalography (EEG) signals recorded from users as they respond to external stimuli or perform various mental activities. The classification process is fraught with difficulties caused by electrical noise, signal artifacts, and nonstationarity. One approach to reducing the effects of similar difficulties in other domains is the use of principal angles between subspaces, which has been applied mostly to video sequences. This dissertation studies and examines different ideas using principal angles and subspaces concepts. It introduces a novel mathematical approach for comparing sets of EEG signals for use in new BCI technology. The success of the presented results show that principal angles are also a useful approach to the classification of EEG signals that are recorded during a BCI typing application. In this application, the appearance of a subject's desired letter is detected by identifying a P300-wave within a one-second window of EEG following the flash of a letter. Smoothing the signals before using them is the only preprocessing step that was implemented in this study. The smoothing process based on minimizing the second derivative in time is implemented to increase the classification accuracy instead of using the bandpass filter that relies on assumptions on the frequency content of EEG. This study examines four different ways of removing outliers that are based on the principal angles and shows that the outlier removal methods did not help in the presented situations. One of the concepts that this dissertation focused on is the effect of the number of trials on the classification accuracies. The achievement of the good classification results by using a small number of trials starting from two trials only, should make this approach more appropriate for online BCI applications. In order to understand and test how EEG signals are different from one subject to another, different users are tested in this dissertation, some with motor impairments. Furthermore, the concept of transferring information between subjects is examined by training the approach on one subject and testing it on the other subject using the training subject's EEG subspaces to classify the testing subject's trials.Item Open Access Machine learned boundary definitions for an expert's tracing assistant in image processing(Colorado State University. Libraries, 2003) Crawford-Hines, Stewart, author; Anderson, Charles W., advisor; Draper, Bruce A. (Bruce Austin), 1962-, committee member; Beveridge, J. Ross, committee member; Alciatore, David G., committee memberMost image processing work addressing boundary definition tasks embeds the assumption that an edge in an image corresponds to the boundary of interest in the world. In straightforward imagery this is true, however it is not always the case. There are images in which edges are indistinct or obscure, and these images can only be segmented by a human expert. The work in this dissertation addresses the range of imagery between the two extremes of those straightforward images and those requiring human guidance to appropriately segment. By freeing systems of a priori edge definitions and building in a mechanism to learn the boundary definitions needed, systems can do better and be more broadly applicable. This dissertation presents the construction of such a boundary-learning system and demonstrates the validity of this premise on real data. A framework was created for the task in which expert-provided boundary exemplars are used to create training data, which in turn are used by a neural network to learn the task and replicate the expert's boundary tracing behavior. This is the framework for the Expert's Tracing Assistant (ETA) system. For a representative set of nine structures in the Visible Human imagery, ETA was compared and contrasted to two state-of-the-art, user guided methods--Intelligent Scissors (IS) and Active Contour Models (ACM). Each method was used to define a boundary, and the distances between these boundaries and an expert's ground truth were compared. Across independent trials, there will be a natural variation in an expert's boundary tracing, and this degree of variation served as a benchmark against which these three methods were compared. For simple structural boundaries, all the methods were equivalent. However, in more difficult cases, ETA was shown to significantly better replicate the expert's boundary than either IS or ACM. In these cases, where the expert's judgement was most called into play to bound the structure, ACM and IS could not adapt to the boundary character used by the expert while ETA could.Item Open Access Machine learning-based fusion studies of rainfall estimation from spaceborne and ground-based radars(Colorado State University. Libraries, 2019) Tan, Haiming, author; Anderson, Charles W., advisor; Chandra, Chandrasekar V., advisor; Ray, Indrajit, committee member; Chavez, Jose L., committee memberPrecipitation measurement by satellite radar plays a significant role in researching the water circle and forecasting extreme weather event. Tropical Rainfall Measuring Mission (TRMM) Precipitation Radar (PR) has capability of providing a high-resolution vertical profile of precipitation over the tropics regions. Its successor, Global Precipitation Measurement (GPM) Dual-frequency Precipitation Radar (DPR), can provide detailed information on the microphysical properties of precipitation particles, quantify particle size distribution and quantitatively measure light rain and falling snow. This thesis presents a novel Machine Learning system for ground-based and space borne radar rainfall estimation. The system first trains ground radar data for rainfall estimation using rainfall measurements from gauges and subsequently uses the ground radar based rainfall estimates to train spaceborne radar data in order to get space based rainfall product. Therein, data alignment between spaceborne and ground radar is conducted using the methodology proposed by Bolen and Chandrasekar (2013), which can minimize the effects of potential geometric distortion of spaceborne radar observations. For demonstration purposes, rainfall measurements from three rain gauge networks near Melbourne, Florida, are used for training and validation purposes. These three gauge networks, which are located in Kennedy Space Center (KSC), South Florida Water Management District (SFL), and St. Johns Water Management District (STJ), include 33, 46, and 99 rain gauge stations, respectively. Collocated ground radar observations from the National Weather Service (NWS) Weather Surveillance Radar – 1988 Doppler (WSR-88D) in Melbourne (i.e., KMLB radar) are trained with the gauge measurements. The trained model is then used to derive KMLB radar based rainfall product, which is used to train both TRMM PR and GPM DPR data collected from coincident overpasses events. The machine learning based rainfall product is compared against the standard satellite products, which shows great potential of the machine learning concept in satellite radar rainfall estimation. Also, the local rain maps generated by machine learning system at KMLB area are demonstrate the application potential.Item Open Access P300 wave detection using Emotiv EPOC+ headset: effects of matrix size, flash duration, and colors(Colorado State University. Libraries, 2016) Alzahrani, Saleh Ibrahim, author; Anderson, Charles W., advisor; Vigh, Jozsef, committee member; Gavin, William, committee memberBrain-computer interfaces (BCIs) allow interactions between human beings and comput- ers without using voluntary muscle. Enormous research effort has been employed in the last few decades to design convenient and user-friendly interfaces. The aim of this study is to provide the people with severe neuromuscular disorders a new augmentative communication technology so that they can express their wishes and communicate with others. The research investigates the capability of Emotiv EPOC+ headset to capture and record one of the BCIs signals called P300 that is used in several applications such as the P300 speller. The P300 speller is a BCI system used to enable severely disabled people to spell words and convey their thoughts without any physical effort. In this thesis, the effects of matrix size, flash duration, and colors were studied. Data are collected from five healthy subjects in their home environments. Different programs are used in this experiment such as OpenViBE platform and MATLAB to pre-process and classify the EEG data. Moreover, the Linear Discriminate Analysis (LDA) classification algorithm is used to classify the data into target and non-target samples.Item Open Access Scalable and data efficient deep reinforcement learning methods for healthcare applications(Colorado State University. Libraries, 2019) Saripalli, Venkata Ratnam, author; Anderson, Charles W., advisor; Hess, Ann Marie, committee member; Young, Peter, committee member; Simske, Steve John, committee memberArtificial intelligence driven medical devices have created the potential for significant breakthroughs in healthcare technology. Healthcare applications using reinforcement learning are still very sparse as the medical domain is very complex and decision making requires domain expertise. High volumes of data generated from medical devices – a key input for delivering on the promise of AI, suffers from both noise and lack of ground truth. The cost of data increases as it is cleaned and annotated. Unlike other data sets, medical data annotation, which is critical for accurate ground truth, requires medical domain expertise for a high-quality patient outcome. While accurate recommendation of decisions is vital in this context, making them in near real-time on devices with computational resource constraint requires that we build efficient, compact representations of models such as deep neural networks. While deeper and wider neural networks are designed for complex healthcare applications, model compression can be an effective way to deploy networks on medical devices that often have hardware and speed constraints. Most state-of-the-art model compression techniques require a resource centric manual process that explores a large model architecture space to find a trade-off solution between model size and accuracy. Recently, reinforcement learning (RL) approaches are proposed to automate such a hand-crafted process. However, most RL model compression algorithms are model-free which require longer time with no assumptions of the model. On the contrary, model-based (MB) approaches are data driven; have faster convergence but are sensitive to the bias in the model. In this work, we report on the use of reinforcement learning to mimic the decision-making process of annotators for medical events, to automate annotation and labelling. The reinforcement agent learns to annotate alarm data based on annotations done by an expert. Our method shows promising results on medical alarm data sets. We trained deep Q-network and advantage actor-critic agents using the data from monitoring devices that are annotated by an expert. Initial results from these RL agents learning the expert-annotated behavior are encouraging and promising. The advantage actor-critic agent performs better in terms of learning the sparse events in a given state, thereby choosing more right actions compared to deep Q-network agent. To the best of our knowledge, this is the first reinforcement learning application for the automation of medical events annotation, which has far-reaching practical use. In addition, a data-driven model-based algorithm is developed, which integrates seamlessly with model-free RL approaches for automation of deep neural network model compression. We evaluate our algorithm on a variety of imaging data from dermoscopy to X-ray on different popular and public model architectures. Compared to model-free RL approaches, our approach achieves faster convergence; exhibits better generalization across different data sets; and preserves comparable model performance. The new RL methods' application to healthcare domain from this work for both false alarm detection and model compression is generic and can be applied to any domain where sequential decision making is partially random and practically controlled by the decision maker.Item Open Access Sentiment analysis in the Arabic language using machine learning(Colorado State University. Libraries, 2015) Alotaibi, Saud Saleh, author; Anderson, Charles W., advisor; Ben-Hur, Asa, committee member; Ray, Indrakshi, committee member; Peterson, Chris, committee memberSentiment analysis has recently become one of the growing areas of research related to natural language processing and machine learning. Much opinion and sentiment about specific topics are available online, which allows several parties such as customers, companies and even governments, to explore these opinions. The first task is to classify the text in terms of whether or not it expresses opinion or factual information. Polarity classification is the second task, which distinguishes between polarities (positive, negative or neutral) that sentences may carry. The analysis of natural language text for the identification of subjectivity and sentiment has been well studied in terms of the English language. Conversely, the work that has been carried out in terms of Arabic remains in its infancy; thus, more cooperation is required between research communities in order for them to offer a mature sentiment analysis system for Arabic. There are recognized challenges in this field; some of which are inherited from the nature of the Arabic language itself, while others are derived from the scarcity of tools and sources. This dissertation provides the rationale behind the current work and proposed methods to enhance the performance of sentiment analysis in the Arabic language. The first step is to increase the resources that help in the analysis process; the most important part of this task is to have annotated sentiment corpora. Several free corpora are available for the English language, but these resources are still limited in other languages, such as Arabic. This dissertation describes the work undertaken by the author to enrich sentiment analysis in Arabic by building a new Arabic Sentiment Corpus. The data is labeled not only with two polarities (positive and negative), but the neutral sentiment is also used during the annotation process. The second step includes the proposal of features that may capture sentiment orientation in the Arabic language, as well as using different machine learning classifiers that may be able to work better and capture the non-linearity with a richly morphological and highly inflectional language, such as Arabic. Different types of features are proposed. These proposed features try to capture different aspects and characteristics of Arabic. Morphological, Semantic, Stylistic features are proposed and investigated. In regard with the classifier, the performance of using linear and nonlinear machine learning approaches was compared. The results are promising for the continued use of nonlinear ML classifiers for this task. Learning knowledge from a particular dataset domain and applying it to a different domain is one useful method in the case of limited resources, such as with the Arabic language. This dissertation shows and discussed the possibility of applying cross-domain in the field of Arabic sentiment analysis. It also indicates the feasibility of using different mechanisms of the cross-domain method. Other work in this dissertation includes the exploration of the effect of negation in Arabic subjectivity and polarity classification. The negation word lists were devised to help in this and other natural language processing tasks. These words include both types of Arabic, Modern Standard and some of Dialects. Two methods of dealing with the negation in sentiment analysis in Arabic were proposed. The first method is based on a static approach that assumes that each sentence containing negation words is considered a negated sentence. When determining the effect of negation, different techniques were proposed, using different word window sizes, or using base phrase chunk. The second approach depends on a dynamic method that needs an annotated negation dataset in order to build a model that can determine whether or not the sentence is negated by the negation words and to establish the effect of the negation on the sentence. The results achieved by adding negation to Arabic sentiment analysis were promising and indicate that the negation has an effect on this task. Finally, the experiments and evaluations that were conducted in this dissertation encourage the researchers to continue in this direction of research.Item Open Access Solving dots & boxes using reinforcement learning(Colorado State University. Libraries, 2022) Pandey, Apoorv, author; Anderson, Charles W., advisor; Beveridge, James Ross, committee member; Chong, Edwin K. P., committee memberReinforcement learning is being used to solve games which were previously deemed too com- plex to solve, the most notable example in recent years being DeepMind solving Go. Dots and boxes is a 2-person game, known by many names across the world and quite popular with children. Here, a reinforcement learning agent learns to play the game. The goal was to develop an agent which would learn to win games, could intelligently execute complex trapping strategies present in the game, and shed new light on game-playing strategy. A 3x3-sized dots and boxes board was used. The agent learned to defeat a random opponent with a win rate of over 80%, and the next version of the agent learned to defeat the previous agent with a win rate of over 99%. A full game analysis was performed for the agent. Unfortunately, the agent was not intelligent enough to defeat a human player.Item Open Access Solving MDPs with thresholded lexicographic ordering using reinforcement learning(Colorado State University. Libraries, 2022) Tercan, Alperen, author; Prabhu, Vinayak S., advisor; Anderson, Charles W., advisor; Chong, Edwin K. P., committee memberMultiobjective problems with a strict importance order over the objectives occur in many real-life scenarios. While Reinforcement Learning (RL) is a promising approach with a great potential to solve many real-life problems, the RL literature focuses primarily on single-objective tasks, and approaches that can directly address multiobjective with importance order have been scarce. The few proposed approach were noted to be heuristics without theoretical guarantees. However, we found that their practical applicability is very limited as they fail to find a good solution even in very common scenarios. In this work, we first investigate these shortcomings of the existing approaches and propose some solutions that could improve their practical performance. Finally, we propose a completely different approach based on policy optimization using our Lexicographic Projection Optimization (LPO) algorithm and show its performance on some benchmark problems.Item Open Access Sparse Bayesian reinforcement learning(Colorado State University. Libraries, 2017) Lee, Minwoo, author; Anderson, Charles W., advisor; Ben-Hur, Asa, committee member; Kirby, Michael, committee member; Young, Peter, committee memberThis dissertation presents knowledge acquisition and retention methods for efficient and robust learning. We propose a framework for learning and memorizing, and we examine how we can use the memory for efficient machine learning. Temporal difference (TD) learning is a core part of reinforcement learning, and it requires function approximation. However, with function approximation, the most popular TD methods such as TD(λ), SARSA, and Q-learning lose stability and diverge especially when the complexity of the problem grows and the sampling distribution is biased. The biased samples cause function approximators such as neural networks to respond quickly to the new data by losing what was previously learned. Systematically selecting a most significant experience, our proposed approach gradually stores the snapshot memory. The memorized snapshots prevent forgetting important samples and increase learning stability. Our sparse Bayesian learning model maintains the sparse snapshot memory for efficiency in computation and memory. The Bayesian model extends and improves TD learning by utilizing the state information in hyperparameters for smart decision of action selection and filtering insignificant experience to maintain sparsity of snapshots for efficiency. The obtained memory can be used to further improve learning. First, the placement of the snapshot memories with a radial basis function kernel located at peaks of the value function approximation surface leads to an efficient way to search a continuous action space for practical application with fine motor control. Second, the memory is a knowledge representation for transfer learning. Transfer learning is a paradigm for knowledge generalization of machine learning and reinforcement learning. Transfer learning shortens the time for machine learning training by using the knowledge gained from similar tasks. The dissertation examines a practice approach that transfers the snapshots from non-goal-directive random movements to goal-driven reinforcement learning tasks. Experiments are described that demonstrate the stability and efficiency of learning in 1) traditional benchmark problems and 2) the octopus arm control problem without limiting or discretizing the action space.Item Open Access Stability analysis of recurrent neural networks with applications(Colorado State University. Libraries, 2008) Knight, James N., author; Anderson, Charles W., advisorRecurrent neural networks are an important tool in the analysis of data with temporal structure. The ability of recurrent networks to model temporal data and act as dynamic mappings makes them ideal for application to complex control problems. Because such networks are dynamic, however, application in control systems, where stability and safety are important, requires certain guarantees about the behavior of the network and its interaction with the controlled system. Both the performance of the system and its stability must be assured. Since the dynamics of controlled systems are never perfectly known, robust control requires that uncertainty in the knowledge of systems be explicitly addressed. Robust control synthesis approaches produce controllers that are stable in the presence of uncertainty. To guarantee robust stability, these controllers must often sacrifice performance on the actual physical system. The addition of adaptive recurrent neural network components to the controller can alleviate, to some extent, the loss of performance associated with robust design by allowing adaptation to observed system dynamics. The assurance of stability of the adaptive neural control system is prerequisite to the application of such techniques. Work in [49, 2] points toward the use of modern stability analysis and robust control techniques in combination with reinforcement learning algorithms to provide adaptive neural controllers with the necessary guarantees of performance and stability. The algorithms developed in these works have a high computational burden due to the cost of the online stability analysis. Conservatism in the stability analysis of the adaptive neural components has a direct impact on the cost of the proposed system. This is due to an increase in the number of stability analysis computations that must be made. The work in [79, 82] provides more efficient tools for the analysis of time-varying recurrent neural network stability than those applied in [49, 2]. Recent results in the analysis of systems with repeated nonlinearities [19, 52, 17] can reduce the conservatism of the analysis developed in [79] and give an overall improvement in the performance of the on-line stability analysis. In this document, steps toward making the application of robust adaptive neural controllers practical are described. The analysis of recurrent neural network stability in [79] is not exact and reductions in the conservatism and computational cost of the analysis are presented. An algorithm is developed allowing the application of the stability analysis results to online adaptive control systems. The algorithm modifies the recurrent neural network updates with a bias away from the boundary between provably stable parameter settings and possibly unstable settings. This bias is derived from the results of the stability analysis, and its method of computation is applicable to a broad class of adaptive control systems not restricted to recurrent neural networks. The use of this bias term reduces the number of expensive stability analysis computations that must be made and thus reduces the computational complexity of the stable adaptive system. An application of the proposed algorithm to an uncertain, nonlinear, control system is provided and points toward future work on this problem that could further the practical application of robust adaptive neural control.Item Open Access The wisdom of the crowd: reliable deep reinforcement learning through ensembles of Q-functions(Colorado State University. Libraries, 2018) Elliott, Daniel L., author; Anderson, Charles W., advisor; Draper, Bruce, committee member; Kirby, Michael, committee member; Chong, Edwin, committee memberReinforcement learning agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is reinforcement learning can feel too slow and unstable during learning: exhibiting performance like that of a randomly initialized Q-function just a few parameter updates after solving the task. We explore the possibility that ensemble methods can remedy these shortcomings and do so by investigating a novel technique which harnesses the wisdom of the crowds by bagging Q-function approximator estimates. Our results show that this proposed approach improves all tasks and reinforcement learning approaches attempted. We are able to demonstrate that this is a direct result of the increased stability of the action portion of the state-action-value function used by Q-learning to select actions and by policy gradient methods to train the policy. Recently developed methods attempt to solve these RL challenges at the cost of increasing the number of interactions with the environment by several orders of magnitude. On the other hand, the proposed approach has little downside for inclusion: it addresses RL challenges while reducing the number interactions with the environment.