Repository logo
 

Theses and Dissertations

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 315
  • ItemEmbargo
    Cooking up a better AR experience: notification design and the liabilities of imperfect cues in augmented reality
    (Colorado State University. Libraries, 2024) Raikwar, Aditya R., author; Ortega, Francisco R., advisor; Ray, Indrakshi, committee member; Moraes, Marcia, committee member; Soto, Hortensia, committee member
    This dissertation investigates optimizing user experience in Augmented Reality (AR). A virtual cooking environment (ARtisan Bistro) serves as a testbed to explore factors influencing user interaction with AR interfaces. The research starts with notification design, examining strategically placed visual and audio notifications in ARtisan Bistro (Chapter 4). Building on this, Chapter 5 explores optimizing these designs for user awareness and delivering critical information, especially when audio is impractical. This involved exploring visual-only notifications, revealing consistent user performance and attention capture comparable to combined visual-audio notifications (no significant difference found). The research demonstrates that well-designed notifications can significantly improve user experience, but it also raises a crucial question: can users always trust the information presented in AR environments? The possibility of imperfect information delivery underscores the importance of reliable information delivery. Chapter 6 explores the impact of imperfect cues generated by machine learning (ML) on user performance in AR visual search tasks. This research highlights the potential for automation bias when users rely heavily on unreliable cues. By investigating both notification design and the limitations of ML systems for reliable information delivery, this dissertation emphasizes the importance of creating a well-rounded user experience in AR environments. The findings underscore the need for further research on optimizing visual notifications, mitigating automation bias, and ensuring reliable information delivery in AR applications.
  • ItemOpen Access
    Smart transfers: challenges and opportunities in boosting low-resource language models with high-resource language power
    (Colorado State University. Libraries, 2024) Manafi, Shadi, author; Krishnaswamy, Nikhil, advisor; Ortega, Francisco R., committee member; Blanchard, Nathaniel, committee member; Chong, Edwin K. P., committee member
    Large language models (LLMs) are predominantly built for high-resource languages (HRLs), leaving low-resource languages (LRLs) underrepresented. To bridge this gap, knowledge transfer from HRLs to LRLs is crucial, but it must be sensitive to low-resource language (LRL)-specific traits and not biased toward an high-resource language (HRL) with larger training data. This dissertation addresses the opportunities and challenges of cross-lingual transfer in two main streams. The first stream explores cross-lingual zero-shot learning in Multilingual Language Models (MLLMs) like mBERT and XLM-R for tasks such as Named Entity Recognition (NER) and section-title prediction. The research introduces adversarial test sets by replacing named entities and modifying common words to evaluate transfer accuracy. Results show that word overlap between languages is essential for both tasks, highlighting the need to account for language-specific features and biases. The second stream develops sentence Transformers, which generate sentence embeddings by mean-pooling contextualized word embeddings. However, these embeddings often struggle to capture sentence similarities effectively. To address this, we fine-tuned an English sentence Transformer by leveraging a word-to-word translation approach and a triplet loss function. Despite using a pre-trained English BERT model and only word-by-word translations without accounting for sentence structure, the results were competitive. This suggests that mean-pooling may weaken attention mechanisms, causing the model to rely more on word embeddings than sentence structure, potentially limiting comprehension of sentence meaning. Together, these streams reveal the complexities of cross-lingual transfer, guiding more effective and equitable use of HRLs to support LRLs in NLP applications.
  • ItemOpen Access
    From neuro-inspired attention methods to generative diffusion: applications to weather and climate
    (Colorado State University. Libraries, 2024) Stock, Jason, author; Anderson, Chuck, advisor; Ebert-Uphoff, Imme, committee member; Krishnaswamy, Nikhil, committee member; Sreedharan, Sarath, committee member
    Machine learning presents new opportunities for addressing the complexities of atmospheric science, where high-dimensional, sparse, and variable data challenge traditional methods. This dissertation introduces a range of algorithms, motivated specifically by the intricacies of weather and climate applications. These challenges complement those that are fundamental in machine learning, such as extracting relevant features, generating high-quality imagery, and providing interpretable model predictions. To this end, we propose methods to integrate adaptive wavelets and spatial attention into neural networks, showing improvements on tasks with limited data. We design a memory-based model of sequential attention to expressively contextualize a subset of image regions. Additionally, we explore transformer models for image translation, with an emphasis on explainability, that overcome the limitations of convolutional networks. Lastly, we discover meaningful long-range dynamics in oscillatory data from an autoregressive generative diffusion model---a very different approach from the current physics-based models. These methods collectively improve predictive performance and deepen our understanding of both the underlying algorithmic and physical processes. The generality of most of these methods is demonstrated on synthetic data and classical vision tasks, but we place a particular emphasis on their impact in weather and climate modeling. Some notable examples include an application to estimate synthetic radar from satellite imagery, predicting the intensity of tropical cyclones, and modeling global climate variability from observational data for intraseasonal predictability. These approaches, however, are flexible and hold potential for adaptation across various application domains and data modalities.
  • ItemEmbargo
    Learning technical Spanish with virtual environments
    (Colorado State University. Libraries, 2024) Siebert, Caspian, author; Ortega, Francisco R., advisor; Miller De Rutté, Alyssia, committee member; Krishnaswamy, Nikhil, committee member
    As the world becomes increasingly interconnected through the internet and travel, foreign language learning is essential for accurate communication and a deeper appreciation of diverse cultures. This study explores the effectiveness of a virtual learning environment employing Artificial Intelligence (AI) designed to facilitate Spanish language acquisition among veterinary students in the context of diagnosing a pet. Students' engagement with virtual scenarios that simulate real-life veterinary consultations in Spanish is examined using a qualitative thematic analysis. Participants have conversations with a virtual pet owner, discussing symptoms, diagnosing conditions, and recommending treatments, all in Spanish. Data was collected through recorded interactions with the application and a semi-structured interview. Findings suggest that immersive virtual environments enhance user engagement and interest, and several suggestions were made to improve the application's features. The study highlights the potential for virtual simulations to bridge the gap between language learning and professional training in specialized fields such as veterinary medicine. Finally, a set of implications of design for future systems is provided.
  • ItemOpen Access
    Towards heterogeneity-aware automatic optimization of time-critical systems via graph machine learning
    (Colorado State University. Libraries, 2024) Canizales Turcios, Ronaldo Armando, author; McClurg, Jedidiah, advisor; Rajopadhye, Sanjay, committee member; Pasricha, Sudeep, committee member
    Modern computing's hardware architecture is increasingly heterogeneous, making optimization challenging; particularly on time-critical systems where correct results are as important as low execution time. First, we explore a study case about the manual optimization of an earthquake engineering-related application, where we parallelized accelerographic records processing. Second, we present egg-no-graph, our novel code-to-graph representation based on equality saturation, which outperforms state-of-the-art methods at estimating execution time. Third, we show how our 150M+ instances heterogeneity-aware dataset was built. Lastly, we redesign a graph-level embedding algorithm, making it converge orders of magnitude faster while maintaining similar accuracy than state-of-the-art on our downstream task, thus being feasible for use on time-critical systems.
  • ItemOpen Access
    In pursuit of industrial like MAXSAT with reduced MAX-3SAT random generation
    (Colorado State University. Libraries, 2024) Floyd, Noah R., author; Whitley, Darrell, advisor; Sreedharan, Sarath, committee member; Aristoff, David, committee member
    In the modern landscape of MAXSAT, there are two broad classifications of problems: Random MAX-3SAT and Industrial SAT. Random MAX-3SAT problems by randomly sampling variables with a uniform probability and randomly assigning signs to the variable, one clause at a time. Industrial MAX-SAT consists of MAX-3SAT problems as encountered in the real world, and generally have a lower nonlinearity than random MAX-3SAT instances. One of the goals of recent research has been to figure out which rules and structures these industrial problems follow and how to replicate them randomly. This paper builds off of the paper" Reduction-Based MAX-3SAT with Low Nonlinearity and Lattices Under Recombination", implementing its approach to MAX-3SAT clause generation and determining what it can reveal about industrial MAX-13SAT and random MAX-3SAT. This builds off of the transformation from SAT to MAX-SAT problems and hopes to create random MAXSAT problems that are more representative of industrial MAXSAT problems. All this would be in the pursuit of random MAX-3SAT that more accurately maps onto real-world MAX-3SAT instances so that more efficient MAX-3SAT solvers can be produced.
  • ItemOpen Access
    Towards generating a pre-training image transformer framework for preserving spatio-spectral properties in hyperspectral satellite images
    (Colorado State University. Libraries, 2024) Faruk, Tanjim Bin, author; Pallickara, Sangmi Lee, advisor; Pallickara, Shrideep, advisor; Cotrufo, M. Francesca, committee member
    Hyperspectral images facilitate advanced geospatial analysis without the need for expensive ground surveys. Machine learning approaches are particularly well-suited for handling the geospatial coverage required by these applications. While self-supervised learning is a promising methodology for managing voluminous datasets with limited labels, existing encoders in self-supervised learning face challenges when applied to hyperspectral images due to the large number of spectral channels. We propose a novel hyperspectral image encoding framework designed to generate highly representative embeddings for subsequent geospatial analysis. Our framework extends the Vision Transformer model with dynamic masking strategies to enhance model performance in regions with high spatial variability. We introduce a novel loss function that incorporates spectral quality metrics and employs the unique channel grouping strategy to leverage spectral similarity across channels. We demonstrate the effectiveness of our approach through a downstream model for estimating soil texture at a 30-meter resolution.
  • ItemOpen Access
    Exploring remote sensing data with high temporal resolutions for wildfire spread prediction
    (Colorado State University. Libraries, 2024) Fitzgerald, Jack, author; Blanchard, Nathaniel, advisor; Krishnaswamy, Nikhil, committee member; Zimmerle, Dan, committee member
    The severity of wildfires has been steadily increasing in the United States over the past few decades, burning up many millions of acres and costing billions of dollars in suppression efforts each year. However, in the same few decades there have been great strides made to advance our technological capabilities. Machine learning is one such technology that has seen spectacular improvements in many areas such as computer vision and natural language processing, and is now being used extensively to model spatiotemporal phenomena such as wildfires via deep learning. Leveraging deep learning to model how wildfires spread can help facilitate evacuation efforts and assist wildland firefighters by highlighting key areas where containment and suppression efforts should be focused. Many recent works have examined the feasibility of using deep learning models to predict when and where wildfires will spread to, which has been enabled in part due to the wealth of geospatial information that is now publicly available and easily accessible on platforms such as Google Earth Engine. In this work, the First Week Wildfire Spread dataset is introduced, which seeks to address some of the limitations with previously released datasets by having an increased focus on geospatial data with high temporal resolutions. The new dataset contains weather, fuel, topography, and fire location data for the first 7 days of 56 megafires that occurred in the Contiguous United States from 2020 to 2024. Fire location data is collected by the Advanced Baseline Imager aboard the GOES-16 satellite, which provides updates every 5 minutes. Baseline experiments are performed using U-Net and ConvLSTM models to demonstrate some of the various ways that the First Week Wildfire Spread dataset can be used and to highlight its versatility.
  • ItemOpen Access
    OnDiscuss: visualizing asynchronous online discussions through an epistemic network analysis tool
    (Colorado State University. Libraries, 2024) Luther, Yanye, author; Moraes, Marcia, advisor; Ghosh, Sudipto, committee member; Folkestad, James, committee member
    Asynchronous online discussions are common assignments in both hybrid and online courses to promote critical thinking and collaboration among students. However, the evaluation of these assignments can require considerable time and effort from instructors. We created OnDiscuss, a learning analytics visualization tool for instructors that utilizes text mining algorithms and Epistemic Network Analysis (ENA) to generate visualizations of student discussion data. Natural language processing and text mining techniques are used to generate an initial codebook for the instructor as well as automatically code the data. This tool allows instructors to edit their codebook and then view the resulting ENA networks for the entire class and individual students. Through empirical investigation, we assess this tool's effectiveness to help instructors in analyzing asynchronous online discussion assignments. Our findings highlight several key insights regarding the implications of this tool for enhancing the accessibility and usability of ENA as a learning analytics visualization tool. OnDiscuss is helpful to those unfamiliar with ENA since it abstracts many of the intricacies of ENA by providing an easy interface to manipulate a codebook and thus the resulting ENA networks. Future refinements, such as the addition of a baseline ENA model, can make it more helpful to those familiar with ENA. Despite the tool's automated keyword generation capabilities, it is clear that instructor intervention remains crucial for refining the codebook. Therefore, while automated techniques like Latent Dirichlet Allocation (LDA) provide valuable insights given a large amount of data, these processes must be complemented by expert guidance.
  • ItemOpen Access
    Exploring the role of biomass design in virtual reality forest bathing
    (Colorado State University. Libraries, 2024) Masters, Rachel A., author; Ortega, Francisco R., advisor; Interrante, Victoria, committee member; Lionelle, Albert, committee member; Moraes, Marcia, committee member; LoTemplio, Sara, committee member
    Stress is an increasingly prevalent problem that has severe health consequences if not managed properly. Every day, people are surrounded by work, health, financial, economic, and a variety of other stressors that deplete cognitive resources and put their nervous systems on high alert. Forest bathing, or nature immersion therapy, has been shown to reduce stress while restoring attentional resources, but despite these benefits, many people lack access to nature for a variety of reasons, including distance and health. VR has the potential to support access to virtual nature environments (VNE's) for people who cannot get into nature, yet the optimal design of biomass or plant life in VNE's is still an active area of research. Additionally, most of these VNE's require high end headsets and computers to run, which is not accessible technology for the everyday consumer. Given the current limitations of popular VR technology such as the Meta Quest 3, it is important to understand the relationship between plant asset realism and a VNE's restorative potential so that a balance can be achieved between a VNE that is deployable on everyday consumer headsets and a VNE that offers restorative benefit. This study was an initial exploration into high and low-realism VNE comparisons, accomplished by a mixed design study that compared two groups of participants, high and low-realism, against each other as well as against their own performance in a control condition where they closed their eyes. Through psychological and physiological measures, stress reduction and perceived attention restoration was assessed as a baseline, after a stressor test, then after the experiment condition to observe potential decreases in stress and increases in attention after the environment. Overall, there was only a significant increase in General Restorativeness in the high-realism environment when compared against the control and the low-realism environment, but trends in the data call for future research on this topic.
  • ItemOpen Access
    Goal alignment: re-analyzing value alignment problems using human-aware AI
    (Colorado State University. Libraries, 2024) Mechergui, Malek, author; Sreedharan, Sarath, advisor; Blanchard, Nathaniel, committee member; Pezeshki, Ali, committee member
    While the question of misspecified objectives has gotten much attention in recent years, most works in this area primarily focus on the challenges related to the complexity of the objective specification mechanism, for example, the use of reward functions. However, the complexity of the objective specification mechanism is just one of many reasons why the user may have misspecified their objective. A foundational cause for misspecification that is being overlooked by the previous works is the inherent asymmetry in human expectations about the agent's behavior and the behavior generated by the agent for the specified objective. To address this, we propose a novel formulation for the objective misspecification problem that builds on the human-aware planning literature, which was originally introduced to support explanation and explicable behavioral generation. Additionally, we propose a first-of-its-kind interactive algorithm that is capable of using information generated under incorrect beliefs about the agent to determine the true underlying goal of the user.
  • ItemOpen Access
    Guiding gaze, evaluating visual cue designs for augmented reality
    (Colorado State University. Libraries, 2024) Kelley, Brendan, author; Ortega, Francisco R., advisor; Tornatzky, Cyane, committee member; Arefin, Mohammed Safayet, committee member
    Visual cueing is an interdisciplinary and complex topic. It has garnered interest for implementation with extended reality (XR). Both augmented reality (AR) and virtual reality (VR), are often employed for visual search tasks. Visual search, a paradigm rooted in cognitive psychology (in particular attention theory), can often benefit from cueing interventions. However, there are several potential pitfalls with using cueing techniques in AR; namely, automation bias, clutter, and cognitive overload. These factors are tied to design and implementation choices, such as modality, representation, dimensionality, reference frame, conveyed information, purpose, markedness, or the task domain. Design factors are subject to both the cognitive factors, as well as, technical specifications of the display technology. To address these factors, this work proposes a within-subject four factor design addressing the question how do different cue designs affect visual search performance? Four cueing conditions are used: no cue (baseline), gaze line, 2D wedge, and 3D arrow. Results support the use of cues for visual search, however the gaze line condition provided for the fastest search time, accuracy, and greatest reduction in head rotation. Additionally, the gaze line cue was preferred by participants and was produced more favorable NASA TLX scores.
  • ItemOpen Access
    Trust based access control and its administration for smart IoT devices
    (Colorado State University. Libraries, 2024) Promi, Zarin Tasnim, author; Ray, Indrajit, advisor; Ray, Indrakshi, committee member; Vijayasarathy, Leo R., committee member
    In today's interconnected world, the security of Internet of Things (IoT) devices is paramount, given the types of smart devices ranging from household appliances to industrial machinery. The continuous, long-term operation of IoT networks increases vulnerability to attacks, and the limited capabilities of IoT devices render standard security measures less effective. Traditional cryptographic methods used for establishing trust through identification and authentication face challenges in IoT contexts due to their computational demands and scalability concerns. Additionally, administration for these intricate networks can become extensive, and the presence of malicious or unskilled human operators can further increase security risks. To combat these issues, adopting a "Zero Trust - Never Trust, Always Verify" strategy is vital in IoT environments. Our approach involves creating an access control model based on device trust, which continuously evaluates the trustworthiness of connected devices and dynamically modifies their access rights according to their trust levels. This enables adaptive and fine-grained access control in IoT settings. Furthermore, we propose a trust-based administrative framework that enables configuration policy, enhancing security and administration efficiency in IoT networks. Similarly to the access control model, this approach will continuously monitor the operator behavior and adjust their operational privileges based on their actions.
  • ItemOpen Access
    Automatically simplifying reductions
    (Colorado State University. Libraries, 2024) Job, Ryan, author; Rajopadhye, Sanjay, advisor; Pallickara, Shrideep, committee member; Snow, Christopher, committee member
    When developing software from a mathematical model, the efficiency of the model and the code which implements it both have significant impacts on the runtime performance of the software. The reduction simplification transformation can be used to automatically provide these benefits, improving the runtime performance of programs while simultaneously making it easier to specify a program. This work, which was done in collaboration with Louis Narmour based on a partial implementation by Tomofumi Yuki, tackles the theoretical gaps in this transformation and provides the first complete, automatic implementation of reduction simplification in a compiler. We demonstrate its effectiveness using the real-world problem of RNA secondary structure prediction. Our compiler automatically rediscovers the known optimization for this problem, which required significant human effort to initially discover and implement. In addition, our compiler discovers several previously unknown optimizations for this problem and generates a C implementation of all optimized programs.
  • ItemEmbargo
    Comparing memorability of gesture sets in an extended reality application
    (Colorado State University. Libraries, 2024) Holen, Ethan J., author; Ortega, Francisco R., advisor; Sreedharan, Sarath, committee member; Rhodes, Matthew, committee member
    In free-form gesture sets, memorability is an important yet often under-explored metric, despite evidence that the usability of interfaces improves when designed with more memorable input gestures. This study examines the memorability of three free-form gesture sets in the HoloLens 2: user-defined, elicitation-defined, and expert-defined. In addition, we examine gestures selected by the participants using common techniques from previous elicitation studies. We found that the user-defined gesture set was the most memorable, with an 88.57% recall rate. And was significantly more unforgettable than the expert-defined (72.73% recall) and the elicitation-defined (59.87% recall). This study also analyzed the user-defined gestures from this experiment. Although this was not an elicitation study, many of the methods commonly used in elicitation studies were used here. This analysis found a higher agreement rate when users were primed with a single gesture set before creating their own and a decrease in agreement when showing them two gesture sets beforehand. Given these results, we propose that designing systems with user-defined gestures will result in the most memorable sets; however, expert-defined gesture sets are also highly memorable and may better suit application design constraints.
  • ItemOpen Access
    Embodied multimodal referring expressions generation
    (Colorado State University. Libraries, 2024) Alalyani, Nada H., author; Krishnaswamy, Nikhil, advisor; Ortega, Francisco, committee member; Blanchard, Nathaniel, committee member; Wang, Haonan, committee member
    Using both verbal and non-verbal modalities in generating definite descriptions of objects and locations is a critical human capability in collaborative interactions. Despite advancements in AI, embodied interactive virtual agents (IVAs) are not equipped to intelligently mix modalities to communicate their intents as humans do, which hamstrings naturalistic multimodal IVA. We introduce SCMRE, a situated corpus of multimodal referring expressions (MREs) intended for training generative AI systems in multimodal IVA, focusing on multimodal referring expressions. Our contributions include: 1) Developing an IVA platform that interprets human multimodal instructions and responds with language and gestures; 2) Providing 24 participants with 10 scenes, each involving ten equally-sized blocks randomly placed on a table. These interactions generated a dataset of 10,408 samples; 3) Analyzing SCMRE, revealing that the utilization of pointing significantly reduces the ambiguity of prompts and increases the efficiency of IVA's execution of humans' prompts; 4) Augmenting and synthesizing SCMRE, resulting in 22,159 samples to generate more data for model training; 5) Finetuning LLaMA 2-chat-13B for generating contextually-correct and situationally-fluent multimodal referring expressions; 6) Integrating the fine-tuned model into the IVA to evaluate the success of the generative model-enabled IVA in communication with humans; 7) Establishing the evaluation process which applies to both humans and IVAs and combines quantitative and qualitative metrics.
  • ItemEmbargo
    Interaction and navigation in cross-reality analytics
    (Colorado State University. Libraries, 2024) Zhou, Xiaoyan, author; Ortega, Francisco, advisor; Ray, Indrakshi, committee member; Moraes, Marcia, committee member; Batmaz, Anil Ufuk, committee member; Malinin, Laura, committee member
    Along with immersive display technology's fast evolution, augmented reality (AR) and virtual reality (VR) are increasingly being researched to facilitate data analytics, known as Immersive Analytics. The ability to interact with data visualization in the space around users not only builds the foundation of ubiquitous analytics but also assists users in the sensemaking of the data. However, interaction and navigation while making sense of 3D data visualization in different realities still need to be better understood and explored. For example, what are the differences between users interacting in augmented and virtual reality, and how can we utilize them in the best way during analysis tasks? Moreover, based on the existing work and our preliminary studies, improving the interaction efficiency with immersive displays still needs to be solved. Therefore, this thesis focuses on understanding interaction and navigation in augmented reality and virtual reality for immersive analytics. First, we explored how users interact with multiple objects in augmented reality by using the "Wizard of Oz" study approach. We elicited multimodal interactions involving hand gestures and speech, with text prompts shown on the head-mounted display. Then, we compared the results with previous work in a single-object scenario, which helped us better understand how users prefer to interact in a more complex AR environment. Second, we built an immersive analytics platform in both AR and VR environments to simulate a realistic scenario and conducted a controlled study to evaluate user performance with designed analysis tools and 3D data visualization. Based on the results, interaction and navigation patterns were observed and analyzed for a better understanding of user preferences during the sensemaking process. ii Lastly, by considering the findings and insights from prior studies, we developed a hybrid user interface in simulated cross-reality for situated analytics. An exploratory study was conducted with a smart home setting to understand user interaction and navigation in a more familiar scenario with practical tasks. With the results, we did a thorough qualitative analysis of feedback and video recording to disclose user preferences with interaction and visualization in situated analytics in the everyday decision-making scenario. In conclusion, this thesis uncovered user-designed multimodal interaction including mid-air hand gestures and speech for AR, users' interaction and navigation strategies in immersive analytics in both AR and VR, and hybrid user interface usage in situated analytics for assisting decision-making. Our findings and insights in this thesis provide guidelines and inspiration for future research in interaction and navigation design and improving user experience with analytics in mixed-reality environments.
  • ItemEmbargo
    Towards automated security and privacy policies specification and analysis
    (Colorado State University. Libraries, 2024) Alqurashi, Saja Salem, author; Ray, Indrakshi, advisor; Ray, Indrajit, committee member; Malaiya, Yashwant, committee member; Simske, Steve, committee member
    Security and privacy policies, vital for information systems, are typically expressed in natural language documents. Security policy is represented by Access Control Policies (ACPs) within security requirements, initially drafted in natural language and subsequently translated into enforce- able policy. The unstructured and ambiguous nature of the natural language documents makes the manual translation process tedious, expensive, labor-intensive, and prone to errors. On the other hand, Privacy policy, with its length and complexity, presents unique challenges. The dense language and extensive content of the privacy policies can be overwhelming, hindering both novice users and experts from fully understanding the practices related to data collection and sharing. The disclosure of these data practices to users, as mandated by privacy regulations such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), is of utmost importance. To address these challenges, we have turned to Natural Language Processing (NLP) to automate extracting critical information from natural language documents and analyze those security and privacy policies. Thus, this dissertation aims to address two primary research questions: Question 1: How can we automate the translation of Access Control Policies (ACPs) from natural language expressions to the formal model of Next Generation Access Control (NGAC) and subsequently analyze the generated model? Question 2: How can we automate the extraction and analysis of data practices from privacy policies to ensure alignment with privacy regulations (GDPR and CCPA)? Addressing these research questions necessitates the development of a comprehensive framework comprising two key components. The first component, SR2ACM, focuses on translating natural language ACPs into the NGAC model. This component introduces a series of innovative contributions to the analysis of security policies. At the core of our contributions is an automated approach to constructing ACPs within the NGAC specification directly from natural language documents. Our approach integrates machine learning with software testing, a novel methodology to ensure the quality of the extracted access control model. The second component, Privacy2Practice, is designed to automate the extraction and analysis of the data practices from privacy policies written in natural language. We have developed an automated method to extract data practices mandated by privacy regulations and to analyze the disclosure of these data practices within the privacy policies. The novelty of this research lies in creating a comprehensive framework that identifies the critical elements within security and privacy policies. Thus, this innovative framework enables automated extraction and analysis of both types of policies directly from natural language documents.
  • ItemOpen Access
    SMOKE+: a video dataset for automated fine-grained assessment of smoke opacity
    (Colorado State University. Libraries, 2024) Seefried, Ethan, author; Blanchard, Nathaniel, advisor; Sreedharan, Sarath, committee member; Roberts, Jacob, committee member
    Computer vision has traditionally faced difficulties when applied to amorphous objects like smoke, owing to their ever-changing shape, texture, and dependence on background conditions. While recent advancements have enabled simple tasks such as smoke detection and basic classification (black or white), quantitative opacity estimation in line with the assessments made by certified professionals remains unexplored. To address this gap, I introduce the SMOKE+ dataset, which features opacity labels verified by three certified experts. My dataset encompasses five distinct testing days, two data collection sites in different regions, and a total of 13,632 labeled clips. Leveraging this data, we develop a state-of-the-art smoke opacity estimation method that employs a small number of Residual 3D blocks for efficient opacity estimation. Additionally I explore the use of MAMBA blocks in a video based architecture, exploiting their ability to handle spatial and temporal data in a linear fashion. Techniques developed during the SMOKE+ dataset creation were then refined and applied to a new dataset titled CSU101, designed for educational use in Computer Vision. In the future I intend to expand further into synthetic data, incorporating techniques into Unreal Engine or Unity to add accurate opacity labels.
  • ItemOpen Access
    Toward robust embedded networks in heavy vehicles - machine learning strategies for fault tolerance
    (Colorado State University. Libraries, 2024) Ghatak, Chandrima, author; Ray, Indrakshi, advisor; Malaiya, Yashwant, committee member; Kokoszka, Piotr, committee member
    In the domain of critical infrastructure, Medium and Heavy Duty (MHD) vehicles play an integral role in both military and civilian operations. These vehicles are essential for the efficiency and reliability of modern logistics. The operations of modern MHD vehicles are heavily automated through embedded computers called Electronic Control Units (ECUs). These ECUs utilize arrays of sensors to control and optimize various vehicle functions and are critical to maintaining operational effectiveness. In most MHD vehicles, this sensor data is predominantly communicated using the Society of Automotive Engineering's (SAE) J1939 Protocol over Controller Area Networks (CAN) and is vital for the smooth functioning of MHD vehicles. The resilience of these communication networks is especially crucial in adversarial environments where sensor systems are susceptible to disruptions caused by physical (kinetic) or cyber-attacks. This dissertation presents an innovative approach using predictive machine learning algorithms to forecast accurate sensor readings in scenarios where sensor systems become compromised. The study focuses on the SAE J1939 networks in MHD vehicles, utilizing real-world data from a Class 6 Kenworth T270 truck. Three distinct machine-learning methods are explored and evaluated for their effectiveness in predicting missing sensor data. The results demonstrate that these models can nearly accurately predict sensor data, which is essential in preventing the vehicle from entering engine protection or limp modes, thereby extending operational capacity under adverse conditions. Overall, this research highlights the potential of machine learning in enhancing the resilience of networked cyber-physical systems, particularly in MHD vehicles. It underscores the significance of predictive algorithms in maintaining operational feasibility and contributes to the broader discussion on the resilience of critical infrastructure in hostile settings.