Browsing by Author "Ray, Indrakshi, committee member"
Now showing 1 - 20 of 26
Results Per Page
Sort Options
Item Open Access A heuristic-based approach to automatically extract personalized attack graph related concepts from vulnerability descriptions(Colorado State University. Libraries, 2017) Mukherjee, Subhojeet, author; Ray, Indrajit, advisor; Ray, Indrakshi, committee member; Byrne, Zinta, committee memberComputer users are not safe, be it at home or in public places. Public networks are more often administered by trained individuals who attempt to fortify those networks using strong administrative skills, state-of-the-art security tools and meticulous vigilance. This is, however, not true for home computer users. Being largely untrained they are often the most likely targets of cyber attacks. These attacks are often executed in cleverly interleaved sequences leading to the eventual goal of the attacker. The Personalized Attack Graphs (PAG) introduced by Ubranska et al. [24, 25, 32] can leverage the interplay of system configurations, attacker and user actions to represent a cleverly interleaved sequence of attacks on a single system. An instance of the PAG can be generated manually by observing system configurations of a computer and collating them with possible security threats which can exploit existing system vulnerabilities and/or misconfigurations. However, the amount of manual labor involved in creating and periodically updating the PAG can be very high. As a result, attempt should be made to automate the process of generating the PAG. Information required to generate these graphs are available on the Internet in the form of vulnerability descriptions. This information is, however, almost always written in natural language and lacks any form of structure. In this thesis, we propose an unsupervised heuristic-based approach which parses vulnerability descriptions and extracts instances of PAG related concepts like system configurations, attacker and user actions. Extracted concepts can then be interleaved to generate the Personalized Attack Graph.Item Open Access A novel approach to statistical problems without identifiability(Colorado State University. Libraries, 2024) Adams, Addison D., author; Wang, Haonan, advisor; Zhou, Tianjian, advisor; Kokoszka, Piotr, committee member; Shaby, Ben, committee member; Ray, Indrakshi, committee memberIn this dissertation, we propose novel approaches to random coefficient regression (RCR) and the recovery of mixing distributions under nonidentifiable scenarios. The RCR model is an extension of the classical linear regression model that accounts for individual variation by treating the regression coefficients as random variables. A major interest lies in the estimation of the joint probability distribution of these random coefficients based on the observable samples of the outcome variable evaluated for different values of the explanatory variables. In Chapter 2, we consider fixed-design RCR models, under which the coefficient distribution is not identifiable. To tackle the challenges of nonidentifiability, we consider an equivalence class, in which each element is a plausible coefficient distribution that, for each value of the explanatory variables, yields the same distribution for the outcome variable. In particular, we formulate the approximations of the coefficient distributions as a collection of stochastic inverse problems, allowing for a more flexible nonparametric approach with minimal assumptions. An iterative approach is proposed to approximate the elements by incorporating an initial guess of a solution called the global ansatz. We further study its convergence and demonstrate its performance through simulation studies. The proposed approach is applied to a real data set from an acupuncture clinical trial. In Chapter 3, we consider the problem of recovering a mixing distribution, given a component distribution family and observations from a compound distribution. Most existing methods are restricted in scope in that they are developed for certain component distribution families or continuity structures of mixing distributions. We propose a new, flexible nonparametric approach with minimal assumptions. Our proposed method iteratively steps closer to the desired mixing distribution, starting from a user-specified distribution, and we further establish its convergence properties. Simulation studies are conducted to examine the performance of our proposed method. In addition, we demonstrate the utility of our proposed method through its application to two sets of real-world data, including prostate cancer data and Shakespeare's canon word count.Item Open Access A scenario-based technique to analyze UML design class models(Colorado State University. Libraries, 2014) Yu, Lijun, author; France, Robert B., advisor; Ray, Indrakshi, committee member; Ghosh, Sudipto, committee member; Malaiya, Yashwant, committee member; Turk, Dan, committee memberIdentifying and resolving design problems in the early design phases can help reduce the number of design errors in implementations. In this dissertation a tool-supported lightweight static analysis technique is proposed to rigorously analyze UML design class models that include operations specified using the Object Constraint Language (OCL). A UML design class model is analyzed against a given set of scenarios that describe desired or undesired behaviors. The technique can leverage existing class model analysis tools such as USE and OCLE. The analysis technique is lightweight in that it analyzes functionality specified in a UML design class model within the scope of a given set of scenarios. It is static because it does not require that the UML design class model be executable. The technique is used to (1) transform a UML design class model to a snapshot transition model that captures valid state transitions, (2) transform given scenarios to snapshot transitions and (3) determine if the snapshot transitions conform or not to the snapshot transition model. A design inconsistency exists if snapshot transitions that represent desired behaviors do not conform to the snapshot transition model, or if snapshot transitions representing undesired behaviors conform to the snapshot transition model. A Scenario-based UML Design Analysis tool was developed using Kermeta and the Eclipse Modeling Framework. The tool can be used to transform an Ecore design class model to a snapshot transition model and transform scenarios to snapshot transitions. The tool is integrated with the USE analysis tool. We used the Scenario-based UML Design Analysis technique to analyze two design class models: a Train Management System model and a Generalized Spatio-Temporal RBAC model. The two demonstration case studies show how the technique can be used to analyze the inconsistencies between UML design class models and scenarios. We performed a pilot study to evaluate the effectiveness of the Scenario-based UML Design Analysis technique. In the pilot study the technique uncovered at least as many design inconsistencies as manual inspection techniques uncovered, and the technique did not uncover false inconsistencies. The pilot study provides some evidence that the Scenario-based UML Design Analysis technique is effective. The dissertation also proposes two scenario generation techniques. These techniques can be used to ease the manual effort needed to produce scenarios. The scenario generation techniques can be used to automatically generate a family of scenarios that conform to specified scenario generation criteria.Item Open Access Anchor centric virtual coordinate systems in wireless sensor networks: from self-organization to network awareness(Colorado State University. Libraries, 2012) Dhanapala, Dulanjalie C., author; Jayasumana, Anura P., advisor; Kirby, Michael, committee member; Pezeshki, Ali, committee member; Ray, Indrakshi, committee memberFuture Wireless Sensor Networks (WSNs) will be collections of thousands to millions of sensor nodes, automated to self-organize, adapt, and collaborate to facilitate distributed monitoring and actuation. They may even be deployed over harsh geographical terrains and 3D structures. Low-cost sensor nodes that facilitate such massive scale networks have stringent resource constraints (e.g., in memory and energy) and limited capabilities (e.g., in communication range and computational power). Economic constraints exclude the use of expensive hardware such as Global Positioning Systems (GPSs) for network organization and structuring in many WSN applications. Alternatives that depend on signal strength measurements are highly sensitive to noise and fading, and thus often are not pragmatic for network organization. Robust, scalable, and efficient algorithms for network organization and reliable information exchange that overcome the above limitations without degrading the network's lifespan are vital for facilitating future large-scale WSN networks. This research develops fundamental algorithms and techniques targeting self-organization, data dissemination, and discovery of physical properties such as boundaries of large-scale WSNs without the need for costly physical position information. Our approach is based on Anchor Centric Virtual Coordinate Systems, commonly called Virtual Coordinate Systems (VCSs), in which each node is characterized by a coordinate vector of shortest path hop distances to a set of anchor nodes. We develop and evaluate algorithms and techniques for the following tasks associated with use of VCSs in WSNs: (a) novelty analysis of each anchor coordinate and compressed representation of VCSs; (b) regaining lost directionality and identifying a 'good' set of anchors; (c) generating topology preserving maps (TPMs); (d) efficient and reliable data dissemination, and boundary identification without physical information; and (f) achieving network awareness at individual nodes. After investigating properties and issues related to VCS, a Directional VCS (DVCS) is proposed based on a novel transformation that restores the lost directionality information in VCS. Extreme Node Search (ENS), a novel and efficient anchor placement scheme, starts with two randomly placed anchors and then uses this directional transformation to identify the number and placement of anchors in a completely distributed manner. Furthermore, a novelty-filtering-based approach for identifying a set of 'good' anchors that reduces the overhead and power consumption in routing is discussed. Physical layout information such as physical voids and even relative physical positions of sensor nodes with respect to X-Y directions are absent in a VCS description. Obtaining such information independent of physical information or signal strength measurements has not been possible until now. Two novel techniques to extract Topology Preserving Maps (TPMs) from VCS, based on Singular Value Decomposition (SVD) and DVCS are presented. A TPM is a distorted version of the layout of the network, but one that preserves the neighborhood information of the network. The generalized SVD-based TPM scheme for 3D networks provides TPMs even in situations where obtaining accurate physical information is not possible. The ability to restore directionality and topology-based Cartesian coordinates makes VCS competitive and, in many cases, a better alternative to geographic coordinates. This is demonstrated using two novel routing schemes in VC domain that outperform the well-known physical information-based routing schemes. The first scheme, DVC Routing (DVCR) uses the directionality recovered by DVCS. Geo-Logical Routing (GLR) is a technique that combines the advantages of geographic and logical routing to achieve higher routability at a lower cost by alternating between topology and virtual coordinate spaces to overcome local minima in the two domains. GLR uses topology domain coordinates derived solely from VCS as a better alternative for physical location information. A boundary detection scheme that is capable of identifying physical boundaries even for 3D surfaces is also proposed. "Network awareness" is a node's cognition of its neighborhood, its position in the network, and the network-wide status of the sensed phenomena. A novel technique is presented whereby a node achieves network awareness by passive listening to routine messages associated with applications in large-scale WSNs. With the knowledge of the network topology and phenomena distribution, every node is capable of making solo decisions that are more sensible and intelligent, thereby improving overall network performance, efficiency, and lifespan. In essence, this research has laid a firm foundation for use of Anchor Centric Virtual Coordinate Systems in WSN applications, without the need for physical coordinates. Topology coordinates, derived from virtual coordinates, provide a novel, economical, and in many cases, a better alternative to physical coordinates. A novel concept of network awareness at nodes is demonstrated.Item Open Access Anomaly detection with machine learning for automotive cyber-physical systems(Colorado State University. Libraries, 2022) Thiruloga, Sooryaa Vignesh, author; Pasricha, Sudeep, advisor; Kim, Ryan, committee member; Ray, Indrakshi, committee memberToday's automotive systems are evolving at a rapid pace and there has been a seismic shift in automotive technology in the past few years. Automakers are racing to redefine the automobile as a fully autonomous and connected system. As a result, new technologies such as advanced driver assistance systems (ADAS), vehicle-to-vehicle (V2V), 5G vehicle to infrastructure (V2I), and vehicle to everything (V2X), etc. have emerged in recent years. These advances have resulted in increased responsibilities for the electronic control units (ECUs) in the vehicles, requiring a more sophisticated in-vehicle network to address the growing communication needs of ECUs with each other and external subsystems. This in turn has transformed modern vehicles into a complex distributed cyber-physical system. The ever-growing connectivity to external systems in such vehicles is introducing new challenges, related to the increasing vulnerability of such vehicles to various cyber-attacks. A malicious actor can use various access points in a vehicle, e.g., Bluetooth and USB ports, telematic systems, and OBD-II ports, to gain unauthorized access to the in-vehicle network. These access points are used to gain access to the network from the vehicle's attack surface. After gaining access to the in-vehicle network through an attack surface, a malicious actor can inject or alter messages on the network to try to take control of the vehicle. Traditional security mechanisms such as firewalls only detect simple attacks as they do not have the ability to detect more complex attacks. With the increasing complexity of vehicles, the attack surface increases, paving the way for more complex and novel attacks in the future. Thus, there is a need for an advanced attack detection solution that can actively monitor the in-vehicle network and detect complex cyber-attacks. One of the many approaches to achieve this is by using an intrusion detection system (IDS). Many state-of-the-art IDS employ machine learning algorithms to detect cyber-attacks for its ability to detect both previously observed as well as novel attack patterns. Moreover, the large availability of in-vehicle network data and increasing computational power of the ECUs to handle emerging complex automotive tasks facilitates the use of machine learning models. Therefore, due to its large spectrum of attack coverage and ability to detect complex attack patterns, we adopt and propose two novel machine learning based IDS frameworks (LATTE and TENET) for in-vehicle network anomaly detection. Our proposed LATTE framework uses sequence models, such as LSTMs, in an unsupervised setting to learn the normal system behavior. LATTE leverages the learned information at runtime to detect anomalies by observing for any deviations from the learned normal behavior. Our proposed LATTE framework aims to maximize the anomaly detection accuracy, precision, and recall while minimizing the false-positive rate. The increased complexity of automotive systems has resulted in very long term dependencies between messages which cannot be effectively captured by LSTMs. Hence to overcome this problem, we proposed a novel IDS framework called TENET. TENET employs a novel convolutional neural attention (TCNA) based architecture to effectively learn very-long term dependencies between messages in an in-vehicle network during the training phase and leverage the learned information in combination with a decision tree classifier to detect anomalous messages. Our work aims to efficiently detect a multitude of attacks in the in-vehicle network with low memory and computational overhead on the ECU.Item Open Access Applying static code analysis to firewall policies for the purpose of anomaly detection(Colorado State University. Libraries, 2009) Zaliva, Vadim, author; Ray, Indrajit, 1966-, advisor; Turk, Daniel E., committee member; Ray, Indrakshi, committee memberTreating modern firewall policy languages as imperative, special purpose programming languages, in this thesis we will try to apply static code analysis techniques for the purpose of anomaly detection. We will first abstract a policy in common firewall policy language into an intermediate language, and then we will try to apply anomaly detection algorithms to it. The contributions made by this thesis are: 1. An analysis of various control flow instructions in popular firewall policy languages 2. Introduction of an intermediate firewall policy language, with emphasis on control flow constructs. 3. Application of Static Code Analysis to detect anomalies in firewall policy, expressed in intermediate firewall policy language. 4. Sample implementation of Static Code Analysis of firewall policies, expressed in our abstract language using Datalog language.Item Open Access Automating investigative pattern detection using machine learning & graph pattern matching techniques(Colorado State University. Libraries, 2022) Muramudalige, Shashika R., author; Jayasumana, Anura P., advisor; Ray, Indrakshi, committee member; Kim, Ryan G., committee member; Wang, Haonan, committee memberIdentification and analysis of latent and emergent behavioral patterns are core tasks in investigative domains such as homeland security, counterterrorism, and crime prevention. Development of behavioral trajectory models associated with radicalization and tracking individuals and groups based on such trajectories are critical for law enforcement investigations, but these are hampered by sheer volume and nature of data that need to be mined and processed. Dynamic and complex behaviors of extremists and extremist groups, missing or incomplete information, and lack of intelligent tools further obstruct counterterrorism efforts. Our research is aimed at developing state-of-the-art computational tools while building on recent advances in machine learning, natural language processing (NLP), and graph databases. In this work, we address the challenges of investigative pattern detection by developing algorithms, tools, and techniques primarily aimed at behavioral pattern tracking and identification for domestic radicalization. The methods developed are integrated in a framework, Investigative Pattern Detection Framework for Counterterrorism (INSPECT). INSPECT includes components for extracting information using NLP techniques, information networks to store in appropriate databases while enabling investigative graph searches, and data synthesis via generative adversarial techniques to overcome limitations due to incomplete and sparse data. These components enable streamlining investigative pattern detection while accommodating various use cases and datasets. While our outcomes are beneficial for law enforcement and counterterrorism applications to counteract the threat of violent extremism, as the results presented demonstrate, the proposed framework is adaptable to diverse behavioral pattern analysis domains such as consumer analytics, cybersecurity, and behavioral health. Information on radicalization activity and participant profiles of interest to investigative tasks are mostly found in disparate text sources. We integrate NLP approaches such as named entity recognition (NER), coreference resolution, and multi-label text classification to extract structured information regarding behavioral indicators, temporal details, and other metadata. We further use multiple text pre-processing approaches to improve the accuracy of data extraction. Our training text datasets are intrinsically smaller and label-wise imbalanced, which hinders direct application of NLP techniques for better results. We use a transfer learning-based, pre-trained NLP model by integrating our specific datasets and achieve noteworthy improvement in information extraction. The extracted information from text sources represents a rich knowledge network of populations with various types of connections that needs to be stored, updated, and repeatedly inspected for emergence of patterns in the long term. Therefore, we utilize graph databases as the foremost storage option while maintaining the reliability and scalability of behavioral data processing. To query suspicious and vulnerable individuals or groups, we implement investigative graph search algorithms as custom stored procedures on top of graph databases while verifying the ability to operate at scale. We use datasets in different contexts to demonstrate the wide-range applicability and the enhanced effectiveness of observing suspicious or latent trends using our investigative graph searches. Investigative data by nature is incomplete and sparse, and the number of cases that may be used for training investigators or machine learning algorithms is small. This is an inherent concern in investigative and many other contexts where the data collection is tedious, available data is limited and also may be subjected to privacy concerns. Having large datasets is beneficial to social scientists and investigative authorities to enhance their skills, and to achieve more accuracy and reliability. A not so small training data volume is also essential for application of the latest machine learning techniques for improved classification and detection. In this work, we propose a generative adversarial network (GAN) based approach with novel feature mapping techniques to synthesize additional data from a small and sparse data set while preserving the statistical characteristics. We also compare our proposed method with two likelihood approaches. i.e., multi-variate Gaussian and regular-vine copulas. We verify the robustness of the proposed technique via a simulation and real-world datasets representing diverse domains. The proposed GAN-based data generation approach is applicable to other domains as demonstrated with two applications. Initially, we extend our data generation approach by contributing to a computer security application resulting in improved phishing websites detection with synthesized datasets. We merge measured datasets with synthesized samples and re-train models to improve the performance of classification models and mitigate vulnerability against adversarial samples. The second was related to a video traffic classification application in which to the data sets are enhanced while preserving statistical similarity between the actual and synthesized datasets. For the video traffic data generation, we modified our data generation technique to capture the temporal patterns in time series data. In this application, we integrate a Wasserstein GAN (WGAN) by using different snapshots of the same video signal with feature-mapping techniques. A trace splitting algorithm is presented for training data of video traces that exhibit higher data throughput with high bursts at the beginning of the video session compared to the rest of the session. With synthesized data, we obtain 5 - 15% accuracy improvement for classification compared to only having actual traces. The INSPECT framework is validated primarily by mining detailed forensic biographies of known jihadists, which are extensively used by social/political scientists. Additionally, each component in the framework is extensively validated with a Human-In-The-Loop (HITL) process, which improves the reliability and accuracy of machine learning models, investigative graph algorithms, and other computing tools based on feedback from social scientists. The entire framework is embedded in a modular architecture where the analytical components are implemented independently and adjustable for different requirements and datasets. We verified the proposed framework's reliability, scalability, and generalizability with datasets in different domains. This research also makes a significant contribution to discrete and sparse data generation in diverse application domains with novel generative adversarial data synthesizing techniques.Item Open Access Calibration of CSU CHIVO radar during the RELAMPAGO campaign(Colorado State University. Libraries, 2022) Kim, Juhyup, author; Chandrasekaran, V., advisor; Ray, Indrakshi, committee member; Cheney, Margaret, committee memberColorado State University C-band Hydrometeorological Instrument for Volumetric Observation (CSU CHIVO) radar is a dual-polarization weather radar operated by Colorado State University. CHIVO radar is easy to be transported and deployed compared to conventional S-band radars. CHIVO radar can be disassembled, shipped, and re-assembled to be deployed to observe weather phenomena at different locations in the world. During the Remote Sensing of Electrification, Lightning, and Mesoscale/Microscale Process with Adaptive Ground Observations (RELAMPAGO) field campaign, CHIVO radar was deployed to Córdoba & Mendoza provinces in Argentina and operated during two observing periods: one from November 10, 2018, to December 22, 2018, and another from December 27, 2018, to January 31, 2019. Any high-quality research radar requires proper calibration to ensure high data quality. To address the requirements associated with high-quality weather radar, this thesis presents 3 aspects of radar calibrations namely a) azimuth, which indicates the horizontal position of targets, b) reflectivity (Z), which indicates the returned power at horizontal polarization, and c) differential reflectivity (ZDR) which indicates the ratio of the horizontal to vertical polarizations of the Z. The calibration techniques presented in this thesis utilizes the sun as a calibration source, ground targets, and meteorological targets. These three techniques are applied appropriately to analyze and calibrate the radar data sets. The goal of the radar calibrations was to improve the data quality to provide researchers with accurate data sets so that weather phenomena under different geological and climatic conditions can be properly studied and understood.Item Open Access Characterizing the visible address space to enable efficient continuous IP geolocation(Colorado State University. Libraries, 2020) Gharaibeh, Manaf, author; Papadopoulos, Christos, advisor; Partridge, Craig, advisor; Heidemann, John, committee member; Ray, Indrakshi, committee member; Hayne, Stephen, committee memberInternet Protocol (IP) geolocation is vital for location-dependent applications and many network research problems. The benefits to applications include enabling content customization, proximal server selection, and management of digital rights based on the location of users, to name a few. The benefits to networking research include providing geographic context useful for several purposes, such as to study the geographic deployment of Internet resources, bind cloud data to a location, and to study censorship and monitoring, among others. The measurement-based IP geolocation is widely considered as the state-of-the-art client- independent approach to estimate the location of an IP address. However, full measurement- based geolocation is prohibitive when applied continuously to the entire Internet to maintain up-to-date IP-to-location mappings. Furthermore, many IP address blocks rarely move, making it unnecessary to perform such full geolocation. The thesis of this dissertation states that we can enable efficient, continuous IP geolocation by identifying clusters of co-located IP addresses and their location stability from latency observations. In this statement, a cluster indicates a group of an arbitrary number of adjacent co- located IP addresses (a few up to a /16). Location stability indicates a measure of how often an IP block changes location. We gain efficiency by allowing IP geolocation systems to geolocate IP addresses as units, and by detecting when a geolocation update is required, optimizations not explored in prior work. We present several studies to support this thesis statement. We first present a study to evaluate the reliability of router geolocation in popular geolocation services, complementing prior work that evaluates end-hosts geolocation in such services. The results show the limitations of these services and the need for better solutions, motivating our work to enable more accurate approaches. Second, we present a method to identify clusters of co-located IP addresses by the similarity in their latency. Identifying such clusters allows us to geolocate them efficiently as units without compromising accuracy. Third, we present an efficient delay-based method to identify IP blocks that move over time, allowing us to recognize when geolocation updates are needed and avoid frequent geolocation of the entire Internet to maintain up-to-date geolocation. In our final study, we present a method to identify cellular blocks by their distinctive variation in latency compared to WiFi and wired blocks. Our method to identify cellular blocks allows a better interpretation of their latency estimates and to study their geographic properties without the need for proprietary data from operators or users.Item Open Access Detecting advanced botnets in enterprise networks(Colorado State University. Libraries, 2017) Zhang, Han, author; Papadopoulos, Christos, advisor; Ray, Indrakshi, committee member; Pallickara, Shrideep, committee member; Hayne, Stephen C., committee memberA botnet is a network composed of compromised computers that are controlled by a botmaster through command and control (C&C) channel. Botnets are more destructive compared to common virus and malware, because they control the resources from many compromised computers. Botnets provide a very important platform for attacks, such as Distributed Denial-of-Service (DDoS), spamming, scanning, and many more. To foil detection systems, botnets began to use various evasion techniques, including encrypted communications, dynamically generated C&C domains, and more. We call such botnets that use evasion techniques as advanced botnets. In this dissertation, we introduce various algorithms and systems to detect advanced botnets in enterprise-like network environment. Encrypted botnets introduce several problems to detection. First, to enable research in detecting encrypted botnets, researchers need samples of encrypted botnet traces with ground truth, which are very hard to get. Traces that are available are not customizable, which prevents testing under various controlled scenarios. To address this problem we introduce BotTalker, a tool that can be used to generate customized encrypted botnet communication traffic. BotTalker emulates the actions a bot would take to encrypt communication. To the best of our knowledge, BotTalker is the first work that provides users customized encrypted botnet traffic. The second problem introduced by encrypted botnets is that Deep Packet Inspection (DPI)-based security systems are foiled. We measure the effects of encryption on three security systems, including Snort, Suricata and BotHunter (BH) using the encrypted botnet traffic generated by BotTalker. The results show that encryption foils these systems greatly. Then, we introduce a method to detect encrypted botnet traffic based on the fact that encryption increases data's entropy. In particular, we present two high-entropy (HE) classifiers and add one of them to enhance BH by utilizing the other detectors it provides. By doing this HE classifier restores BH's ability to detect bots, even when they use encryption. Entropy calculation at line speed is expensive, especially when the flows are very long. To deal with this issue, we introduce two algorithms to classify flows as HE by looking at only part of a flow. In particular, we classify a flow as HE or low entropy (LE) by only considering the first M packets of the flow. These early HE classifiers are used in two ways: (a) to improve the speed of bot detection tools, and (b) as a filter to reduce the load on an Intrusion Detection System (IDS). We implement the filter as a preprocessor in Snort. The results show that by using the first 15 packets of a flow the traffic delivered to IDS is reduced by more than 50% while maintaining more than 99.9% of the original alerts. Comparing our traffic reduction scheme with other work we find that they need to inspect at least 13 times more packets than ours or they miss about 70 times of the alerts. To improve the resiliency of communication between bots and C&C servers, bot masters began utilizing Domain Generation Algorithms (DGA). DGA technique avoids static blacklists as well as prevents security specialists from registering the C&C domain before the botmaster. We introduce BotDigger, a system that detects DGA-based bots using DNS traffic without a priori knowledge of the domain generation algorithm. BotDigger utilizes a chain of evidence, including quantity, temporal and linguistic evidence to detect an individual bot by only monitoring traffic at the DNS servers of a single network. We evaluate BotDigger's performance using traces from two DGA-based botnets: Kraken and Conflicker, as well as a one-week DNS trace captured from our university and three traces collected from our research lab. Our results show that BotDigger detects all the Kraken bots and 99.8% of Conficker bots with very low false positives.Item Open Access Detecting non-secure memory deallocation with CBMC(Colorado State University. Libraries, 2021) Singh, Mohit K., author; Prabhu, Vinayak, advisor; Ray, Indrajit, advisor; Ghosh, Sudipto, committee member; Ray, Indrakshi, committee member; Simske, Steve, committee memberScrubbing sensitive data before releasing memory is a widely recommended but often ignored programming practice for developing secure software. Consequently, sensitive data such as cryptographic keys, passwords, and personal data, can remain in memory indefinitely, thereby increasing the risk of exposure to hackers who can retrieve the data using memory dumps or exploit vulnerabilities such as Heartbleed and Etherleak. We propose an approach for detecting a specific memory safety bug called Improper Clearing of Heap Memory Before Release, referred to as Common Weakness Enumeration 244. The CWE-244 bug in a program allows the leakage of confidential information when a variable is not wiped before heap memory is freed. Our approach uses the CBMC model checker to detect this weakness and is based on instrumenting the program using (1) global variable declarations that track and monitor the state of the program variables relevant for CWE-244, and (2) assertions that help CBMC to detect unscrubbed memory. We develop a tool, SecMD-Checker, implementing our instrumentation based algorithm, and we provide experimental validation on the Juliet Test Suite that the tool is able to detect all the CWE-244 instances present in the test suite. The proposed approach has the potential to work with other model checkers and can be extended for detecting other weaknesses that require variable tracking and monitoring, such as CWE-226, CWE-319, and CWE-1239.Item Embargo Interaction and navigation in cross-reality analytics(Colorado State University. Libraries, 2024) Zhou, Xiaoyan, author; Ortega, Francisco, advisor; Ray, Indrakshi, committee member; Moraes, Marcia, committee member; Batmaz, Anil Ufuk, committee member; Malinin, Laura, committee memberAlong with immersive display technology's fast evolution, augmented reality (AR) and virtual reality (VR) are increasingly being researched to facilitate data analytics, known as Immersive Analytics. The ability to interact with data visualization in the space around users not only builds the foundation of ubiquitous analytics but also assists users in the sensemaking of the data. However, interaction and navigation while making sense of 3D data visualization in different realities still need to be better understood and explored. For example, what are the differences between users interacting in augmented and virtual reality, and how can we utilize them in the best way during analysis tasks? Moreover, based on the existing work and our preliminary studies, improving the interaction efficiency with immersive displays still needs to be solved. Therefore, this thesis focuses on understanding interaction and navigation in augmented reality and virtual reality for immersive analytics. First, we explored how users interact with multiple objects in augmented reality by using the "Wizard of Oz" study approach. We elicited multimodal interactions involving hand gestures and speech, with text prompts shown on the head-mounted display. Then, we compared the results with previous work in a single-object scenario, which helped us better understand how users prefer to interact in a more complex AR environment. Second, we built an immersive analytics platform in both AR and VR environments to simulate a realistic scenario and conducted a controlled study to evaluate user performance with designed analysis tools and 3D data visualization. Based on the results, interaction and navigation patterns were observed and analyzed for a better understanding of user preferences during the sensemaking process. ii Lastly, by considering the findings and insights from prior studies, we developed a hybrid user interface in simulated cross-reality for situated analytics. An exploratory study was conducted with a smart home setting to understand user interaction and navigation in a more familiar scenario with practical tasks. With the results, we did a thorough qualitative analysis of feedback and video recording to disclose user preferences with interaction and visualization in situated analytics in the everyday decision-making scenario. In conclusion, this thesis uncovered user-designed multimodal interaction including mid-air hand gestures and speech for AR, users' interaction and navigation strategies in immersive analytics in both AR and VR, and hybrid user interface usage in situated analytics for assisting decision-making. Our findings and insights in this thesis provide guidelines and inspiration for future research in interaction and navigation design and improving user experience with analytics in mixed-reality environments.Item Embargo Microphysical retrieval in severe storms from ground-based and space-borne radar network: application to La Plata region in South America(Colorado State University. Libraries, 2023) Arias Hernández, Iván D., author; Chandrasekar, V., advisor; Cheney, Margaret, committee member; Ray, Indrakshi, committee member; Chávez, José, committee memberThe microphysics of severe weather is studied using a network approach from multiple platform observations. Observations acquired near the foothills of the Andes in Argentina are used in this investigation. La Plata region in Argentina is known for having some of the tallest storms on Earth. During the Austral summer of 2018, a network of radars was deployed in this region to study these storms as part of the RELAMPAGO field experiment. This network of ground-based radars, in addition to satellite and in-situ observations, is used to understand the microphysics of severe storms in this part of the world. The knowledge gained from studying the microphysics of these storms in South America is applied to understand convection more broadly. In addition, these multiple platform observations are used to understand how the storms in South America may differ from storms in other regions. The analysis from simultaneous radar observations is used to self-calibrate the radar network. In this investigation, first, an extensive calibration of the radar network measurements was performed to obtain high-quality data for this study. The ground-based radars' dual-polarization measurements were calibrated using a network-based approach. In addition, satellite measurements from GPM radar were used as a common platform for calibrating the ground-based radars in the network. A new parameterization for the attenuation correction is developed for ground-based radar in this region as an outcome of the network calibration exercise. After careful calibration, the radar measurements in the network were used to obtain observational statistics over the RELAMPAGO campaign domain. These statistics are applied to understand the connection between the radar retrievals and to select the severe weather cases to study. For the severe weather cases identified in the radar statistics, spectral polarimetric decomposition from radar signal samples in updraft environments is derived. First, updrafts are identified using dual Doppler analysis. Subsequently, the reflectivity, differential reflectivity, and coherence spectra are computed from radar signal samples. Practical considerations about the computation of the spectrum in updraft are also presented. The spectral analysis revealed that bimodalities in the spectrum can be found in updraft conditions. In addition, a technique to quantify the attenuation of C-band radar signals in melting ice was developed using multiple radar observations. The attenuation estimates are used to parameterize the specific attenuation in melting ice to explain the enhanced attenuation. Finally, convective permitting high-resolution simulations are compared with the radar network observations for a representative severe weather case. This comparison is conducted to test the effectiveness of downscaling to resolve better convective processes that lead to severe weather.Item Open Access Multi-criteria analysis in modern information management(Colorado State University. Libraries, 2010) Dewri, Rinku, author; Whitley, L. Darrell, advisor; Ray, Indrajit, 1966-, advisor; Ray, Indrakshi, committee member; Siegel, Howard Jay, committee memberThe past few years have witnessed an overwhelming amount of research in the field of information security and privacy. An encouraging outcome of this research is the vast accumulation of theoretical models that help to capture the various threats that persistently hinder the best possible usage of today's powerful communication infrastructure. While theoretical models are essential to understanding the impact of any breakdown in the infrastructure, they are of limited application if the underlying business centric view is ignored. Information management in this context is the strategic management of the infrastructure, incorporating the knowledge about causes and consequences to arrive at the right balance between risk and profit. Modern information management systems are home to a vast repository of sensitive personal information. While these systems depend on quality data to boost the Quality of Service (QoS), they also run the risk of violating privacy regulations. The presence of network vulnerabilities also weaken these systems since security policies cannot always be enforced to prevent all forms of exploitation. This problem is more strongly grounded in the insufficient availability of resources, rather than the inability to predict zero-day attacks. System resources also impact the availability of access to information, which in itself is becoming more and more ubiquitous day by day. Information access times in such ubiquitous environments must be maintained within a specified QoS level. In short, modern information management must consider the mutual interactions between risks, resources and services to achieve wide scale acceptance. This dissertation explores these problems in the context of three important domains, namely disclosure control, security risk management and wireless data broadcasting. Research in these domains has been put together under the umbrella of multi-criteria decision making to signify that "business survival" is an equally important factor to consider while analyzing risks and providing solutions for their resolution. We emphasize that businesses are always bound by constraints in their effort to mitigate risks and therefore benefit the most from a framework that allows the exploration of solutions that abide by the constraints. Towards this end, we revisit the optimization problems being solved in these domains and argue that they oversee the underlying cost-benefit relationship. Our approach in this work is motivated by the inherent multi-objective nature of the problems. We propose formulations that help expose the cost-benefit relationship across the different objectives that must be met in these problems. Such an analysis provides a decision maker with the necessary information to make an informed decision on the impact of choosing a control measure over the business goals of an organization. The theories and tools necessary to perform this analysis are introduced to the community.Item Open Access On designing large, secure and resilient networked systems(Colorado State University. Libraries, 2019) Mulamba Kadimbadimba, Dieudonné, author; Ray, Indrajit, advisor; Ray, Indrakshi, committee member; McConnell, Ross, committee member; Vijayasarathy, Leo, committee memberDefending large networked systems against rapidly evolving cyber attacks is challenging. This is because of several factors. First, cyber defenders are always fighting an asymmetric warfare: While the attacker needs to find just a single security vulnerability that is unprotected to launch an attack, the defender needs to identify and protect against all possible avenues of attacks to the system. Various types of cost factors, such as, but not limited to, costs related to identifying and installing defenses, costs related to security management, costs related to manpower training and development, costs related to system availability, etc., make this asymmetric warfare even challenging. Second, newer and newer cyber threats are always emerging - the so called zero-day attacks. It is not possible for a cyber defender to defend against an attack for which defenses are yet unknown. In this work, we investigate the problem of designing large and complex networks that are secure and resilient. There are two specific aspects of the problem that we look into. First is the problem of detecting anomalous activities in the network. While this problem has been variously investigated, we address the problem differently. We posit that anomalous activities are the result of mal-actors interacting with non mal-actors, and such anomalous activities are reflected in changes to the topological structure (in a mathematical sense) of the network. We formulate this problem as that of Sybil detection in networks. For our experimentation and hypothesis testing we instantiate the problem as that of Sybil detection in on-line social networks (OSNs). Sybil attacks involve one or more attackers creating and introducing several mal-actors (fake identities in on-line social networks), called Sybils, into a complex network. Depending on the nature of the network system, the goal of the mal-actors can be to unlawfully access data, to forge another user's identity and activity, or to influence and disrupt the normal behavior of the system. The second aspect that we look into is that of building resiliency in a large network that consists of several machines that collectively provide a single service to the outside world. Such networks are particularly vulnerable to Sybil attacks. While our Sybil detection algorithms achieve very high levels of accuracy, they cannot guarantee that all Sybils will be detected. Thus, to protect against such "residual" Sybils (that is, those that remain potentially undetected and continue to attack the network services), we propose a novel Moving Target Defense (MTD) paradigm to build resilient networks. The core idea is that for large enterprise level networks, the survivability of the network's mission is more important than the security of one or more of the servers. We develop protocols to re-locate services from server to server in a random way such that before an attacker has an opportunity to target a specific server and disrupt it’s services, the services will migrate to another non-malicious server. The continuity of the service of the large network is thus sustained. We evaluate the effectiveness of our proposed protocols using theoretical analysis, simulations, and experimentation. For the Sybil detection problem we use both synthetic and real-world data sets. We evaluate the algorithms for accuracy of Sybil detection. For the moving target defense protocols we implement a proof-of-concept in the context of access control as a service, and run several large scale simulations. The proof-of- concept demonstrates the effectiveness of the MTD paradigm. We evaluate the computation and communication complexity of the protocols as we scale up to larger and larger networks.Item Open Access On the design of a moving target defense framework for the resiliency of critical services in large distributed networks(Colorado State University. Libraries, 2018) Amarnath, Athith, author; Ray, Indrajit, advisor; Ray, Indrakshi, committee member; Hayne, Stephen, committee memberSecurity is a very serious concern in this era of digital world. Protecting and controlling access to secured data and services has given more emphasis to access control enforcement and management. Where, access control enforcement with strong policies ensures the data confidentiality, availability and integrity, protecting the access control service itself is equally important. When these services are hosted on a single server for a lengthy period of time, the attackers have potentially unlimited time to periodically explore and enumerate the vulnerabilities with respect to the configuration of the server and launch targeted attacks on the service. Constant proliferation of cloud usage and distributed systems over the last decade have materialized the possibilities of distributing data or hosting services over a group of servers located in different geographical locations. Existing election algorithms used to provide service continuity hosted in the distributed setup work well in a benign environment. However, these algorithms are not secure against skillful attackers who intends to manipulate or bring down the data or service. In this thesis, we design and implement the protection of critical services, such as access-control reference monitors, using the concept of moving target defense. This concept increases the level of difficulty faced by the attacker to compromise the point of service by periodically moving the critical service among a group of heterogeneous servers, thereby changing the attacker surface and increasing uncertainty and randomness in the point of service chosen. We describe an efficient Byzantine fault-tolerant leader election protocol for small networks that achieves the security and performance goals described in the problem statement. We then extend this solution to large enterprise networks by introducing random walk protocol that randomly chooses a subset of servers taking part in the election protocol.Item Open Access On the design of a secure and anonymous publish-subscribe system(Colorado State University. Libraries, 2012) Mulamba Kadimbadimba, Dieudonne, author; Ray, Indrajit, advisor; Ray, Indrakshi, committee member; Vijayasarathy, Leo, committee memberThe reliability and the high availability of data have made online servers very popular among single users or organizations like hospitals, insurance companies or administrations. This has led to an increased dissemination of personal data on public servers. These online companies are increasingly adopting the publish-subscribe as a new model for storing and managing data on a distributed network. While bringing some real improvement in the way these online companies store and manage data in a dynamic and distributed environment, publish-subscribe is also bringing some new challenges of security and privacy. The centralization of personal data on public servers has raised citizens' concerns about their privacy. Several security breaches involving the leakage of personal data have occured, showing us how crucial this issue has become. A significant amount of work has been done in the field of securing the publish-subscribe. However, all of this research assumes that the server is a trusted entity, an assumption which ignores the fact that this server can be honest but curious. This leads to the need to develop a means to protect publishers and subscribers from server curiosity. One solution to this problem could be to anonymize all communications involving publishers and subscribers. This solution will raise in turn another issue which involves how to allow a subscriber to query a file that was anonymously uploaded in the server by the publisher. In this work, we propose an implementation of a communication protocol that allows users to asynchronously and anonymously exchange messages and that also supports a secure deletion of messages.Item Open Access Quantitative analyses of software vulnerabilities(Colorado State University. Libraries, 2011) Joh, HyunChul, author; Malaiya, Yashwant K., advisor; Ray, Indrajit, committee member; Ray, Indrakshi, committee member; Jayasumana, Anura P., committee memberThere have been numerous studies addressing computer security and software vulnerability management. Most of the time, they have taken a qualitative perspective. In many other disciplines, quantitative analyses have been indispensable for performance assessment, metric measurement, functional evaluation, or statistical modeling. Quantitative approaches can also help to improve software risk management by providing guidelines obtained by using actual data-driven analyses for optimal allocations of resources for security testing, scheduling, and development of security patches. Quantitative methods allow objective and more accurate estimates of future trends than qualitative manners only because a quantitative approach uses real datasets with statistical methods which have proven to be a very powerful prediction approach in several research fields. A quantitative methodology makes it possible for end-users to assess the risks posed by vulnerabilities in software systems, and potential breaches without getting burdened by details of every individual vulnerability. At the moment, quantitative risk analysis in information security systems is still in its infancy stage. However, recently, researchers have started to explore various software vulnerability related attributes quantitatively as the vulnerability datasets have now become large enough for statistical analyses. In this dissertation, quantitative analysis is presented dealing with i) modeling vulnerability discovery processes in major Web servers and browsers, ii) relationship between the performance of S-shaped vulnerability discovery models and the skew in vulnerability datasets examined, iii) linear vulnerability discovery trends in multi-version software systems, iv) periodic behavior in weekly exploitation and patching of vulnerabilities as well as long term vulnerability discovery process, and v) software security risk evaluation with respect to the vulnerability lifecycle and CVSS. Results show good superior vulnerability discovery model fittings and reasonable prediction capabilities for both time-based and effort-based models for datasets from Web servers and browsers. Results also show that AML and Gamma distribution based models perform better than other S-shaped models with skewed left and right datasets respectively. We find that code sharing among the successive versions cause a linear discovery pattern. We establish that there are indeed long and short term periodic patterns in software vulnerability related activities which have been only vaguely recognized by the security researchers. Lastly, a framework for software security risk assessment is proposed which can allow a comparison of software systems in terms of the risk and potential approaches for optimization of remediation.Item Open Access Quantitative economics of security: software vulnerabilities and data breaches(Colorado State University. Libraries, 2016) Algarni, Abdullah Mahdi, author; Malaiya, Yashwant K., advisor; Ray, Indrakshi, committee member; Ray, Indrajit, committee member; Kling, Robert, committee memberSecurity vulnerabilities can represent enormous risks to society and business organizations. A large percentage of vulnerabilities in software are discovered by individuals external to the developing organization. These vulnerabilities are often exchanged for monetary rewards or a negotiated selling price, giving rise to vulnerability markets. Some of these markets are regulated, while some are unregulated. Many buyers in the unregulated markets include individuals, groups, or government organizations who intend to use the vulnerabilities for potential attacks. Vulnerabilities traded through such markets can cause great economic, organizational, and national security risks. Vulnerability markets can reduce risks if the vulnerabilities are acquitted and remedied by the software developers. Studying vulnerability markets and their related issues will provide an insight into their underlying mechanisms, which can be used to assess the risks and develop approaches for reducing and mitigating the potential risks to enhance the security against the data breaches. Some of the aspects of vulnerability—discovery, dissemination, and disclosure—have received some recent attention. However, the role of interaction among the vulnerability discoverers and vulnerability acquirers has not yet been adequately addressed. This dissertation suggests that a major fraction of discoverers, a majority in some cases, are unaffiliated with the software developers and thus are free to disseminate the vulnerabilities they discover in any way they like. As a result, multiple vulnerability markets have emerged. In recent vulnerability discovery literature, the vulnerability discoverers have remained anonymous. Although there has been an attempt to model the level of their efforts, information regarding their identities, modes of operation, and what they are doing with the discovered vulnerabilities has not been explored. Reports of buying and selling the vulnerabilities are now appearing in the press; however, the nature of the actual vulnerability markets needs to be analyzed. We have attempted to collect detailed information. We have identified the most prolific vulnerability discoverers throughout the past decade and examined their motivation and methods. A large percentage of these discoverers are located outside of the US. We have contacted several of the most prolific discoverers in order to collect firsthand information regarding their techniques, motivations, and involvement in the vulnerability markets. We examine why many of the discoverers appear to retire after a highly successful vulnerability-finding career. We found that the discoverers had enough experience and good reputation to work officially with a good salary in some well- known software development companies. Many security breaches have been reported in the past few years, impacting both large and small organizations. Such breaches may occur through the exploitation of system vulnerabilities. There has been considerable disagreement about the overall cost and probability of such breaches. No significant formal studies have yet addressed this issue of risk assessment, though some proprietary approaches for evaluating partial data breach costs and probabilities have been implemented. These approaches have not been formally evaluated or compared and have not been systematically optimized. This study proposes a consolidated approach for identifying key factors contributing to the breach cost by minimizing redundancy among the factors. Existing approaches have been evaluated using the data from some of the well-documented breaches. It is noted that the existing models yield widely different estimates. The reasons for this variation are examined and the need for better models is identified. A complete computational model for estimating the costs and probabilities of data breaches for a given organization has been developed. We consider both the fixed and variable costs and the economy of scale. Assessing the impact of data breaches will allow organizations to assess the risks due to potential breaches and to determine the optimal level of resources and effort needed for achieving target levels of security.Item Open Access Sentiment analysis in the Arabic language using machine learning(Colorado State University. Libraries, 2015) Alotaibi, Saud Saleh, author; Anderson, Charles W., advisor; Ben-Hur, Asa, committee member; Ray, Indrakshi, committee member; Peterson, Chris, committee memberSentiment analysis has recently become one of the growing areas of research related to natural language processing and machine learning. Much opinion and sentiment about specific topics are available online, which allows several parties such as customers, companies and even governments, to explore these opinions. The first task is to classify the text in terms of whether or not it expresses opinion or factual information. Polarity classification is the second task, which distinguishes between polarities (positive, negative or neutral) that sentences may carry. The analysis of natural language text for the identification of subjectivity and sentiment has been well studied in terms of the English language. Conversely, the work that has been carried out in terms of Arabic remains in its infancy; thus, more cooperation is required between research communities in order for them to offer a mature sentiment analysis system for Arabic. There are recognized challenges in this field; some of which are inherited from the nature of the Arabic language itself, while others are derived from the scarcity of tools and sources. This dissertation provides the rationale behind the current work and proposed methods to enhance the performance of sentiment analysis in the Arabic language. The first step is to increase the resources that help in the analysis process; the most important part of this task is to have annotated sentiment corpora. Several free corpora are available for the English language, but these resources are still limited in other languages, such as Arabic. This dissertation describes the work undertaken by the author to enrich sentiment analysis in Arabic by building a new Arabic Sentiment Corpus. The data is labeled not only with two polarities (positive and negative), but the neutral sentiment is also used during the annotation process. The second step includes the proposal of features that may capture sentiment orientation in the Arabic language, as well as using different machine learning classifiers that may be able to work better and capture the non-linearity with a richly morphological and highly inflectional language, such as Arabic. Different types of features are proposed. These proposed features try to capture different aspects and characteristics of Arabic. Morphological, Semantic, Stylistic features are proposed and investigated. In regard with the classifier, the performance of using linear and nonlinear machine learning approaches was compared. The results are promising for the continued use of nonlinear ML classifiers for this task. Learning knowledge from a particular dataset domain and applying it to a different domain is one useful method in the case of limited resources, such as with the Arabic language. This dissertation shows and discussed the possibility of applying cross-domain in the field of Arabic sentiment analysis. It also indicates the feasibility of using different mechanisms of the cross-domain method. Other work in this dissertation includes the exploration of the effect of negation in Arabic subjectivity and polarity classification. The negation word lists were devised to help in this and other natural language processing tasks. These words include both types of Arabic, Modern Standard and some of Dialects. Two methods of dealing with the negation in sentiment analysis in Arabic were proposed. The first method is based on a static approach that assumes that each sentence containing negation words is considered a negated sentence. When determining the effect of negation, different techniques were proposed, using different word window sizes, or using base phrase chunk. The second approach depends on a dynamic method that needs an annotated negation dataset in order to build a model that can determine whether or not the sentence is negated by the negation words and to establish the effect of the negation on the sentence. The results achieved by adding negation to Arabic sentiment analysis were promising and indicate that the negation has an effect on this task. Finally, the experiments and evaluations that were conducted in this dissertation encourage the researchers to continue in this direction of research.