Browsing by Author "Ray, Indrajit, committee member"
Now showing 1 - 20 of 26
Results Per Page
Sort Options
Item Open Access A graph-based, systems approach for detecting violent extremist radicalization trajectories and other latent behaviors(Colorado State University. Libraries, 2017) Hung, Benjamin W. K., author; Jayasumana, Anura P., advisor; Chong, Edwin K. P., committee member; Ray, Indrajit, committee member; Sega, Ronald M., committee memberThe number and lethality of violent extremist plots motivated by the Salafi-jihadist ideology have been growing for nearly the last decade in both the U.S and Western Europe. While detecting the radicalization of violent extremists is a key component in preventing future terrorist attacks, it remains a significant challenge to law enforcement due to the issues of both scale and dynamics. Recent terrorist attack successes highlight the real possibility of missed signals from, or continued radicalization by, individuals whom the authorities had formerly investigated and even interviewed. Additionally, beyond considering just the behavioral dynamics of a person of interest is the need for investigators to consider the behaviors and activities of social ties vis-à -vis the person of interest. We undertake a fundamentally systems approach in addressing these challenges by investigating the need and feasibility of a radicalization detection system, a risk assessment assistance technology for law enforcement and intelligence agencies. The proposed system first mines public data and government databases for individuals who exhibit risk indicators for extremist violence, and then enables law enforcement to monitor those individuals at the scope and scale that is lawful, and account for the dynamic indicative behaviors of the individuals and their associates rigorously and automatically. In this thesis, we first identify the operational deficiencies of current law enforcement and intelligence agency efforts, investigate the environmental conditions and stakeholders most salient to the development and operation of the proposed system, and address both programmatic and technical risks with several initial mitigating strategies. We codify this large effort into a radicalization detection system framework. The main thrust of this effort is the investigation of the technological opportunities for the identification of individuals matching a radicalization pattern of behaviors in the proposed radicalization detection system. We frame our technical approach as a unique dynamic graph pattern matching problem, and develop a technology called INSiGHT (Investigative Search for Graph Trajectories) to help identify individuals or small groups with conforming subgraphs to a radicalization query pattern, and follow the match trajectories over time. INSiGHT is aimed at assisting law enforcement and intelligence agencies in monitoring and screening for those individuals whose behaviors indicate a significant risk for violence, and allow for the better prioritization of limited investigative resources. We demonstrated the performance of INSiGHT on a variety of datasets, to include small synthetic radicalization-specific data sets, a real behavioral dataset of time-stamped radicalization indicators of recent U.S. violent extremists, and a large, real-world BlogCatalog dataset serving as a proxy for the type of intelligence or law enforcement data networks that could be utilized to track the radicalization of violent extremists. We also extended INSiGHT by developing a non-combinatorial neighbor matching technique to enable analysts to maintain visibility of potential collective threats and conspiracies and account for the role close social ties have in an individual's radicalization. This enhancement was validated on small, synthetic radicalization-specific datasets as well as the large BlogCatalog dataset with real social network connections and tagging behaviors for over 80K accounts. The results showed that our algorithm returned whole and partial subgraph matches that enabled analysts to gain and maintain visibility on neighbors' activities. Overall, INSiGHT led to consistent, informed, and reliable assessments about those who pose a significant risk for some latent behavior in a variety of settings. Based upon these results, we maintain that INSiGHT is a feasible and useful supporting technology with the potential to optimize law enforcement investigative efforts and ultimately enable the prevention of individuals from carrying out extremist violence. Although the prime motivation of this research is the detection of violent extremist radicalization, we found that INSiGHT is applicable in detecting latent behaviors in other domains such as on-line student assessment and consumer analytics. This utility was demonstrated through experiments with real data. For on-line student assessment, we tested INSiGHT on a MOOC dataset of students and time-stamped on-line course activities to predict those students who persisted in the course. For consumer analytics, we tested the performance on a real, large proprietary consumer activities dataset from a home improvement retailer. Lastly, motivated by the desire to validate INSiGHT as a screening technology when ground truth is known, we developed a synthetic data generator of large population, time-stamped, individual-level consumer activities data consistent with an a priori project set designation (latent behavior). This contribution also sets the stage for future work in developing an analogous synthetic data generator for radicalization indicators to serve as a testbed for INSiGHT and other data mining algorithms.Item Open Access A unified modeling language framework for specifying and analyzing temporal properties(Colorado State University. Libraries, 2018) Al Lail, Mustafa, author; France, Robert B., advisor; Ray, Indrakshi, advisor; Ray, Indrajit, committee member; Hamid, Idris Samawi, committee member; Malaiya, Yashwant K., committee memberIn the context of Model-Driven Engineering (MDE), designers use the Unified Modeling Language (UML) to create models that drive the entire development process. Once UML models are created, MDE techniques automatically generate code from the models. If the models have undetected faults, they are propagated to code where they require considerable time and effort to detect and correct. It is therefore mandatory to analyze UML models at earlier stages of the development life-cycle to ensure the success of the MDE techniques in producing reliable software. One approach to uncovering design errors is to formally specify and analyze the properties that a system has to satisfy. Although significant research appears in specifying and analyzing properties, there is not an effective and efficient UML-based framework that specifies and analyzes temporal properties. The contribution of this dissertation is a UML-based framework and tools for aiding UML designers to effectively and efficiently specify and analyze temporal properties. In particular, the framework is composed of 1) a UML specification technique that designers can use to specify temporal properties, 2) a rigorous analysis technique for analyzing temporal properties, 3) an optimization technique to scale the analysis to large class models, and 4) a proof-of-concept tool. An evaluation of the framework using two real-world studies shows that the specification technique can be used to specify a variety of temporal properties and the analysis technique can uncover certain types of design faults. It also demonstrates that the optimization technique can significantly speed up the analysis.Item Open Access An access control framework for mobile applications(Colorado State University. Libraries, 2013) Abdunabi, Ramadan, author; Ray, Indrakshi, advisor; France, Robert, committee member; Ray, Indrajit, committee member; Turk, Daniel, committee memberWith the advent of wireless and mobile devices, many new applications are being developed that make use of the spatio-temporal information of a user in order to provide better functionality. Such applications also necessitate sophisticated authorization models where access to a resource depends on the credentials of the user and also on the location and time of access. Consequently, traditional access control models, such as, Role-Based Access Control (RBAC), has been augmented to provide spatio-temporal access control. However, the velocity of technological development imposes sophisticated constraints that might not be possible to support with earlier works. In this dissertation, we provide an access control framework that allows one to specify, verify, and enforce spatio-temporal policies of mobile applications. Our specification of spatio-temporal access control improves the expressiveness upon earlier works by providing features that are useful for mobile applications. Thus, an application using our model can specify different types of spatio-temporal constraints. It defines a number of novel concepts that allow ease of integration of access control policies with applications and make policy models more amenable to analysis. Our access control models are presented using both theoretical and practical methods. Our models have numerous features that may interact to produce conflicts. Towards this end, we also develop automated analysis approaches for conflict detection and correction at model and application levels. These approaches rigorously check policy models and provide feedback when some properties do not hold. For strict temporal behaviour, our analysis can be used to perform a quantitative verification of the temporal properties while considering mobility. We also provide a number of techniques to reduce the state-space explosion problem that is inherent in model checkers. Furthermore, we introduce a policy enforcement mechanism illustrates the practical viability of our models and discusses potential challenges with possible solutions. Specifically, we propose an event-based architecture for enforcing spatio-temporal access control and demonstrate its feasibility by developing a prototype. We also provide a number of protocols for granting and revoking access and formally analyze these protocols in order to provide assurance that our proposed architecture is indeed secure.Item Open Access Anomaly detection and explanation in big data(Colorado State University. Libraries, 2021) Homayouni, Hajar, author; Ghosh, Sudipto, advisor; Ray, Indrakshi, advisor; Bieman, James M., committee member; Ray, Indrajit, committee member; Vijayasarathy, Leo R., committee memberData quality tests are used to validate the data stored in databases and data warehouses, and to detect violations of syntactic and semantic constraints. Domain experts grapple with the issues related to the capturing of all the important constraints and checking that they are satisfied. The constraints are often identified in an ad hoc manner based on the knowledge of the application domain and the needs of the stakeholders. Constraints can exist over single or multiple attributes as well as records involving time series and sequences. The constraints involving multiple attributes can involve both linear and non-linear relationships among the attributes. We propose ADQuaTe as a data quality test framework that automatically (1) discovers different types of constraints from the data, (2) marks records that violate the constraints as suspicious, and (3) explains the violations. Domain knowledge is required to determine whether or not the suspicious records are actually faulty. The framework can incorporate feedback from domain experts to improve the accuracy of constraint discovery and anomaly detection. We instantiate ADQuaTe in two ways to detect anomalies in non-sequence and sequence data. The first instantiation (ADQuaTe2) uses an unsupervised approach called autoencoder for constraint discovery in non-sequence data. ADQuaTe2 is based on analyzing records in isolation to discover constraints among the attributes. We evaluate the effectiveness of ADQuaTe2 using real-world non-sequence datasets from the human health and plant diagnosis domains. We demonstrate that ADQuaTe2 can discover new constraints that were previously unspecified in existing data quality tests, and can report both previously detected and new faults in the data. We also use non-sequence datasets from the UCI repository to evaluate the improvement in the accuracy of ADQuaTe2 after incorporating ground truth knowledge and retraining the autoencoder model. The second instantiation (IDEAL) uses an unsupervised LSTM-autoencoder for constraint discovery in sequence data. IDEAL analyzes the correlations and dependencies among data records to discover constraints. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy of the approach improves after incorporating ground truth knowledge about the injected faults and retraining the LSTM-Autoencoder model. The novelty of this research lies in the development of a domain-independent framework that effectively and efficiently discovers different types of constraints from the data, detects and explains anomalous data, and minimizes false alarms through an interactive learning process.Item Embargo Automated extraction of access control policy from natural language documents(Colorado State University. Libraries, 2023) Alqurashi, Saja, author; Ray, Indrakshi, advisor; Ray, Indrajit, committee member; Malaiya, Yashwant, committee member; Simske, Steve, committee memberData security and privacy are fundamental requirements in information systems. The first step to providing data security and privacy for organizations is defining access control policies (ACPs). Security requirements are often expressed in natural languages, and ACPs are embedded in the security requirements. However, ACPs in natural language are unstructured and ambiguous, so manually extracting ACPs from security requirements and translating them into enforceable policies is tedious, complex, expensive, labor-intensive, and error-prone. Thus, the automated ACPs specification process is crucial. In this thesis, we consider the Next Generation Access Control (NGAC) model as our reference formal access control model to study the automation process. This thesis addresses the research question: How do we automatically translate access control policies (ACPs) from natural language expression to the NGAC formal specification? Answering this research question entails building an automated extraction framework. The pro- posed framework aims to translate natural language ACPs into NGAC specifications automatically. The primary contributions of this research are developing models to construct ACPs in NGAC specification from natural language automatically and generating a realistic synthetic dataset of access control policies sentences to evaluate the proposed framework. Our experimental results are promising as we achieved, on average, an F1-score of 93 % when identifying ACPs sentences, an F1-score of 96 % when extracting NGAC relations between attributes, and an F1-score of 96% when extracting user attribute and 89% for object attribute from natural language access control policies.Item Open Access Behavioral complexity analysis of networked systems to identify malware attacks(Colorado State University. Libraries, 2020) Haefner, Kyle, author; Ray, Indrakshi, advisor; Ben-Hur, Asa, committee member; Gersch, Joe, committee member; Hayne, Stephen, committee member; Ray, Indrajit, committee memberInternet of Things (IoT) environments are often composed of a diverse set of devices that span a broad range of functionality, making them a challenge to secure. This diversity of function leads to a commensurate diversity in network traffic, some devices have simple network footprints and some devices have complex network footprints. This network-complexity in a device's traffic provides a differentiator that can be used by the network to distinguish which devices are most effectively managed autonomously and which devices are not. This study proposes an informed autonomous learning method by quantifying the complexity of a device based on historic traffic and applies this complexity metric to build a probabilistic model of the device's normal behavior using a Gaussian Mixture Model (GMM). This method results in an anomaly detection classifier with inlier probability thresholds customized to the complexity of each device without requiring labeled data. The model efficacy is then evaluated using seven common types of real malware traffic and across four device datasets of network traffic: one residential-based, two from labs, and one consisting of commercial automation devices. The results of the analysis of over 100 devices and 800 experiments show that the model leads to highly accurate representations of the devices and a strong correlation between the measured complexity of a device and the accuracy to which its network behavior can be modeled.Item Open Access Capture and reconstruction of the topology of undirected graphs from partial coordinates: a matrix completion based approach(Colorado State University. Libraries, 2017) Ramasamy, Sridhar, author; Jayasumana, Anura, advisor; Paffenroth, Randy, committee member; Ray, Indrajit, committee member; Pasricha, Sudeep, committee memberWith the advancement in science and technology, new types of complex networks have become common place across varied domains such as computer networks, Internet, bio-technological studies, sociology, and condensed matter physics. The surge of interest in research towards graphs and topology can be attributed to important applications such as graph representation of words in computational linguistics, identification of terrorists for national security, studying complicated atomic structures, and modeling connectivity in condensed matter physics. Well-known social networks, Facebook, and twitter, have millions of users, while the science citation index is a repository of millions of records and citations. These examples indicate the importance of efficient techniques for measuring, characterizing and mining large and complex networks. Often analysis of graph attributes to understand the graph topology and embedded properties on these complex graphs becomes difficult due to causes such need to process huge data volumes, lack of compressed representation forms and lack of complete information. Due to improper or inadequate acquiring processes, inaccessibility, etc., often we end up with partial graph representational data. Thus there is immense significance in being able to extract this missing information from the available data. Therefore obtaining the topology of a graph, such as a communication network or a social network from incomplete information is our research focus. Specifically, this research addresses the problem of capturing and reconstructing the topology of a network from a small set of path length measurements. An accurate solution for this problem also provides means of describing graphs with a compressed representation. A technique to obtain the topology from only a partial set of information about network paths is presented. Specifically, we demonstrate the capture of the network topology from a small set of measurements corresponding to a) shortest hop distances of nodes with respect to small set of nodes called as anchors, or b) a set of pairwise hop distances between random node pairs. These two measurement sets can be related to the Distance matrix D, a common representation of the topology, where an entry contains the shortest hop distance between two nodes. In an anchor based method, the shortest hop distances of nodes to a set of M anchors constitute what is known as a Virtual Coordinate (VC) matrix. This is a submatrix of columns of D corresponding to the anchor nodes. Random pairwise measurements correspond to a random subset of elements of D. The proposed technique depends on a low rank matrix completion method based on extended Robust Principal Component Analysis to extract the unknown elements. The application of the principles of matrix completion relies on the conjecture that many natural data sets are inherently low dimensional and thus corresponding matrix is relatively low ranked. We demonstrate that this is applicable to D of many large-scale networks as well. Thus we are able to use results from the theory of matrix completion for capturing the topology. Two important types of graphs have been used for evaluation of the proposed technique, namely, Wireless Sensor Network (WSN) graphs and social network graphs. For WSN examples, we use the Topology Preserving Map (TPM), which is a homeomorphic representation of the original layout, to evaluate the effectiveness of the technique from partial sets of entries of VC matrix. A double centering based approach is used to evaluate the TPMs from VCs, in comparison with the existing non-centered approach. Results are presented for both random anchors and nodes that are farthest apart on the boundaries. The idea of obtaining topology is extended towards social network link prediction. The significance of this result lies in the fact that with increasing privacy concerns, obtaining the data in the form of VC matrix or as hop distance matrix becomes difficult. This approach of predicting the unknown entries of a matrix provides a novel approach for social network link predictions, and is supported by the fact that the distance matrices of most real world networks are naturally low ranked. The accuracy of the proposed techniques is evaluated using 4 different WSN and 3 different social networks. Two 2D and two 3D networks have been used for WSNs with the number of nodes ranging from 500 to 1600. We are able to obtain accurate TPMs for both random anchors and extreme anchors with only 20% to 40% of VC matrix entries. The mean error quantifies the error introduced in TPMs due to unknown entries. The results indicate that even with 80% of entries missing, the mean error is around 35% to 45%. The Facebook, Collaboration and Enron Email sub networks, with 744, 4158, 3892 nodes respectively, have been used for social network capture. The results obtained are very promising. With 80% of information missing in the hop-distance matrix, a maximum error of only around 6% is incurred. The error in prediction of hop distance is less than 0.5 hops. This has also opened up the idea of compressed representation of networks by its VC matrix.Item Open Access Dynamic representation of consecutive-ones matrices and interval graphs(Colorado State University. Libraries, 2015) Springer, William M., II, author; McConnell, Ross M., advisor; Ray, Indrajit, committee member; Bohm, Wim, committee member; Hulpke, Alexander, committee memberWe give an algorithm for updating a consecutive-ones ordering of a consecutive-ones matrix when a row or column is added or deleted. When the addition of the row or column would result in a matrix that does not have the consecutive-ones property, we return a well-known minimal forbidden submatrix for the consecutive-ones property, known as a Tucker submatrix, which serves as a certificate of correctness of the output in this case, in O(n log n) time. The ability to return such a certificate within this time bound is one of the new contributions of this work. Using this result, we obtain an O(n) algorithm for updating an interval model of an interval graph when an edge or vertex is added or deleted. This matches the bounds obtained by a previous dynamic interval-graph recognition algorithm due to Crespelle. We improve on Crespelle's result by producing an easy-to-check certificate, known as a Lekkerkerker-Boland subgraph, when a proposed change to the graph results in a graph that is not an interval graph. Our algorithm takes O(n log n) time to produce this certificate. The ability to return such a certificate within this time bound is the second main contribution of this work.Item Open Access Enhancing collaborative peer-to-peer systems using resource aggregation and caching: a multi-attribute resource and query aware approach(Colorado State University. Libraries, 2012) Bandara, H. M. N. Dilum, author; Jayasumana, Anura P., advisor; Chandrasekar, V., committee member; Massey, Daniel F., committee member; Ray, Indrajit, committee memberTo view the abstract, please see the full text of the document.Item Open Access Explicit and quantitative results for abelian varieties over finite fields(Colorado State University. Libraries, 2022) Krause, Elliot, author; Achter, Jeffrey, advisor; Pries, Rachel, committee member; Juul, Jamie, committee member; Ray, Indrajit, committee memberLet E be an ordinary elliptic curve over a prime field Fp. Attached to E is the characteristic polynomial of the Frobenius endomorphism, T2 − a1T + p, which controls several of the invariants of E, such as the point count and the size of the isogeny class. As we base change E over extensions Fpn, we may study the distribution of point counts for both of these invariants. Additionally, we look to quantify the rate at which these distributions converge to the expected distribution. More generally, one may consider these same questions for collections of ordinary elliptic curves and abelian varieties.Item Open Access Extraction, characterization and modeling of network data features - a compressive sensing and robust PCA based approach(Colorado State University. Libraries, 2015) Bandara, Vidarshana W., author; Jayasumana, Anura P., advisor; Pezeshki, Ali, advisor; Scharf, Louis L., committee member; Ray, Indrajit, committee member; Luo, J. Rockey, committee memberTo view the abstract, please see the full text of the document.Item Open Access Helping humans and agents avoid undesirable consequences with models of intervention(Colorado State University. Libraries, 2021) Weerawardhana, Sachini Situmini, author; Whitley, Darrell, advisor; Ray, Indrajit, committee member; Pallickara, Sangmi, committee member; Ortega, Francisco, committee member; Seger, Carol, committee memberWhen working in an unfamiliar online environment, it can be helpful to have an observer that can intervene and guide a user toward a desirable outcome while avoiding undesirable outcomes or frustration. The Intervention Problem is deciding when to intervene in order to help a user. The Intervention Problem is similar to, but distinct from, Plan Recognition because the observer must not only recognize the intended goals of a user but also when to intervene to help the user when necessary. In this dissertation, we formalize a family of intervention problems to address two sub-problems: (1) The Intervention Recognition Problem, and (2) The Intervention Recovery Problem. The Intervention Recognition Problem views the environment as a state transition system where an agent (or a human user), in order to achieve a desirable outcome, executes actions that change the environment from one state to the next. Some states in the environment are undesirable and the user does not have the ability to recognize them and the intervening agent wants to help the user in the environment avoid the undesirable state. In this dissertation, we model the environment as a classical planning problem and discuss three intervention models to address the Intervention Recognition Problem. The three models address different dimensions of the Intervention Recognition Problem, specifically the actors in the environment, information hidden from the intervening agent, type of observations and noise in the observations. The first model: Intervention by Recognizing Actions Enabling Multiple Undesirable Consequences, is motivated by a study where we observed how home computer users practice cyber-security and take action to unwittingly put their online safety at risk. The model is defined for an environment where three agents: the user, the attacker and the intervening agent are present. The intervening agent helps the user reach a desirable goal that is hidden from the intervening agent by recognizing critical actions that enable multiple undesirable consequences. We view the problem of recognizing critical actions as a multi-factor decision problem of three domain-independent metrics: certainty, timeliness and desirability. The three metrics simulate the trade-off between the safety and freedom of the observed agent when selecting critical actions to intervene. The second model: Intervention as Classical Planning, we model scenarios where the intervening agent observes a user and a competitor attempting to achieve different goals in the same environment. A key difference in this model compared to the first model is that the intervening agent is aware of the user's desirable goal and the undesirable state. The intervening agent exploits the classical planning representation of the environment and uses automated planning to project the possible outcomes in the environment exactly and approximately. To recognize when intervention is required, the observer analyzes the plan suffixes leading to the user's desirable goal and the undesirable state and learns the differences between the plans that achieve the desirable goal and plans that achieve the undesirable state using machine learning. Similar to the first model, learning the differences between the safe and unsafe plans allows the intervening agent to balance specific actions with those that are necessary for the user to allow some freedom. The third model: Human-aware Intervention, we assume that the user is a human solving a cognitively engaging planning task. When human users plan, unlike an automated planner, they do not have the ability to use heuristics to search for the best solution. They often make mistakes and spend time exploring the search space of the planning problem. The complication this adds to the Intervention Recognition Problem is that deciding to intervene by analyzing plan suffixes generated by an automated planner is no longer feasible. Using a cognitively engaging puzzle solving task (Rush Hour) we study how human users solve the puzzle as a planning task and develop the Human-aware Intervention model combining automated planning and machine learning. The intervening agent uses a domain specific feature set more appropriate for human behavior to decide in real time whether to intervene the human user. Our experiments using the benchmark planning domains and human subject studies show that the three intervention recognition models out performs existing plan recognition algorithms in predicting when intervention is required. Our solution to address the Intervention Recovery Problem goes beyond the typical preventative measures to help the human user recover from intervention. We propose the Interactive Human-aware Intervention where a human user solves a cognitively engaging planning task with the assistance of an agent that implements the Human-aware Intervention. The Interactive Human-aware Intervention is different from typical preventive measures where the agent executes actions to modify the domain such that the undesirable plan can not progress (e.g., block an action). Our approach interactively guides the human user toward the solution to the planning task by revealing information about the remaining planning task. We evaluate the Interactive Human-aware Intervention using both subjective and objective measures in a human subject study.Item Open Access Integration of task-attribute based access control model for mobile workflow authorization and management(Colorado State University. Libraries, 2019) Basnet, Rejina, author; Ray, Indrakshi, advisor; Abdunabi, Ramadan, advisor; Ray, Indrajit, committee member; Vijayasarathy, Leo R., committee memberWorkflow is the automation of process logistics for managing simple every day to complex multi-user tasks. By defining a workflow with proper constraints, an organization can improve its efficiency, responsiveness, profitability, and security. In addition, mobile technology and cloud computing has enabled wireless data transmission, receipt and allows the workflows to be executed at any time and from any place. At the same time, security concerns arise because unauthorized users may get access to sensitive data or services from lost or stolen nomadic devices. Additionally, some tasks and information associated are location and time sensitive in nature. These security and usability challenges demand the employment of access control in a mobile workflow system to express fine-grained authorization rules for actors to perform tasks on-site and at certain time intervals. For example, if an individual is assigned a task to survey certain location, it is crucial that the individual is present in the very location while entering the data and all the data entered remotely is safe and secure. In this work, we formally defined an authorization model for mobile workflows. The authorization model was based on the NIST(Next Generation Access Control) where user attributes, resources attributes, and environment attributes decide who has access to what resources. In our model, we introduced the concept of spatio temporal zone attribute that captures the time and location as to when and where tasks could be executed. The model also captured the relationships between the various components and identified how they were dependent on time and location. It captured separation of duty constraints that prevented an authorized user from executing conflicting tasks and dependency of task constraints which imposed further restrictions on who could execute the tasks. The model was dynamic and allowed the access control configuration to change through obligations. The model had various constraints that may conflict with each other or introduce inconsistencies. Towards this end, we simulated the model using Timed Colored Petri Nets (TCPN) and ran queries to ensure the integrity of the model. The access control information was stored in the Neo4j graph database. We demonstrated the feasibility and usefulness of this method through performance analysis. Overall, we tended to explore and verify the necessity of access control for security as well as management of workflows. This work resulted in the development of secure, accountable, transparent, efficient, and usable workflows that could be deployed by enterprises.Item Open Access Machine learning-based fusion studies of rainfall estimation from spaceborne and ground-based radars(Colorado State University. Libraries, 2019) Tan, Haiming, author; Anderson, Charles W., advisor; Chandra, Chandrasekar V., advisor; Ray, Indrajit, committee member; Chavez, Jose L., committee memberPrecipitation measurement by satellite radar plays a significant role in researching the water circle and forecasting extreme weather event. Tropical Rainfall Measuring Mission (TRMM) Precipitation Radar (PR) has capability of providing a high-resolution vertical profile of precipitation over the tropics regions. Its successor, Global Precipitation Measurement (GPM) Dual-frequency Precipitation Radar (DPR), can provide detailed information on the microphysical properties of precipitation particles, quantify particle size distribution and quantitatively measure light rain and falling snow. This thesis presents a novel Machine Learning system for ground-based and space borne radar rainfall estimation. The system first trains ground radar data for rainfall estimation using rainfall measurements from gauges and subsequently uses the ground radar based rainfall estimates to train spaceborne radar data in order to get space based rainfall product. Therein, data alignment between spaceborne and ground radar is conducted using the methodology proposed by Bolen and Chandrasekar (2013), which can minimize the effects of potential geometric distortion of spaceborne radar observations. For demonstration purposes, rainfall measurements from three rain gauge networks near Melbourne, Florida, are used for training and validation purposes. These three gauge networks, which are located in Kennedy Space Center (KSC), South Florida Water Management District (SFL), and St. Johns Water Management District (STJ), include 33, 46, and 99 rain gauge stations, respectively. Collocated ground radar observations from the National Weather Service (NWS) Weather Surveillance Radar – 1988 Doppler (WSR-88D) in Melbourne (i.e., KMLB radar) are trained with the gauge measurements. The trained model is then used to derive KMLB radar based rainfall product, which is used to train both TRMM PR and GPM DPR data collected from coincident overpasses events. The machine learning based rainfall product is compared against the standard satellite products, which shows great potential of the machine learning concept in satellite radar rainfall estimation. Also, the local rain maps generated by machine learning system at KMLB area are demonstrate the application potential.Item Open Access Machine learning-based phishing detection using URL features: a comprehensive review(Colorado State University. Libraries, 2023) Asif, Asif Uz Zaman, author; Ray, Indrakshi, advisor; Shirazi, Hossein, advisor; Ray, Indrajit, committee member; Wang, Haonan, committee memberIn a social engineering attack known as phishing, a perpetrator sends a false message to a victim while posing as a trusted representative in an effort to collect private information such as login passwords and financial information for personal gain. To successfully carry out a phishing attack, fraudulent websites, emails, and messages that are counterfeit are utilized to trick the victim. Machine learning appears to be a promising technique for phishing detection. Typically, website content and Unified Resource Locator (URL) based features are used. However, gathering website content features requires visiting malicious sites, and preparing the data is labor-intensive. Towards this end, researchers are investigating if URL-only information can be used for phishing detection. This approach is lightweight and can be installed at the client's end, they do not require data collection from malicious sites and can identify zero-day attacks. We conduct a systematic literature review on URL-based phishing detection. We selected recent papers (2018 --) or if they had a high citation count (50+ in Google Scholar) that appeared in top conferences and journals in cybersecurity. This survey will provide researchers and practitioners with information on the current state of research on URL-based website phishing attack detection methodologies. The results of this study show that, despite the lack of a centralized dataset, this is beneficial because it prevents attackers from seeing the features that classifiers employ. However, the approach is time-consuming for researchers. Furthermore, for algorithms, both machine learning and deep learning algorithms can be utilized since they have very good classification accuracy, and in this work, we found that Random Forest and Long Short-Term Memory are good choices of algorithms. Using task-specific lexical characteristics rather than concentrating on the number of features is essential for this work because feature selection will impact how accurately algorithms will detect phishing URLs.Item Open Access Monitoring and characterizing application service availability(Colorado State University. Libraries, 2018) Rammer, Daniel P., author; Papadopoulos, Christos, advisor; Ray, Indrajit, committee member; Hayne, Stephen, committee memberReliable detection of global application service availability remains an open problem on the Internet. Some availability issues are diagnosable by an administrator monitoring the service locally, but far more may be identified by monitoring user requests (ie. DNS / SSL misconfiguration). In this work we present Proddle, a distributed application layer measurement framework. The application periodically submits HTTP(S) requests from geographically diverse vantages to gather service availability information at the application layer. Using these measurements we reliably catalog application service unavailability events and identify their cause. Finally, analysis is performed to identify telling event characteristics including event frequency, duration, and visibility.Item Open Access Multilevel secure data stream management system(Colorado State University. Libraries, 2013) Xie, Xing, author; Ray, Indrakshi, advisor; Ray, Indrajit, committee member; France, Robert, committee member; Turk, Daniel, committee memberWith the advent of mobile and sensor devices, situation monitoring applications are now feasible. The data processing system should be able to collect large amount data with high input rate, compute results on-the-fly and take actions in real-time. Data Stream Management Systems (DSMSs) have been proposed to address those needs. In DSMS the infinite input data is divided by arriving timestamps and buffered in input windows; and queries are processed against the finite data in a fixed size window. The output results are updated by timestamps continuously. However, data streams at various sensitivity levels are often generated in monitoring applications which should be processed without security breaches. Therefore current DSMSs cannot prevent illegal information flow when processing inputs and queries from different levels. We have developed multilevel secure (MLS) stream processing systems that operate input data with security levels. We've accomplished four tasks include: (1) providing formalization of a model and language for representing secure continuous queries, (2) investigating centralized and distributed architectures able to handle MLS continuous queries, and designing authentication models, query rewriting and optimization mechanisms, and scheduling strategies to ensure that queries are processed in a secure and timelymanner, (3) developing sharing approaches between queries to improve quality of service. Besides we've implemented extensible prototypes with experiments to compare performance between different process strategies and architectures, (4) and proposing an information flow control model adapted from the Chinese Wall policy that can be used to protect against sensitive data disclosure, as an extension of multilevel secure DSMS for stream audit applications.Item Open Access On component-oriented access control in lightweight virtualized server environments(Colorado State University. Libraries, 2017) Belyaev, Kirill, author; Ray, Indrakshi, advisor; Ray, Indrajit, committee member; Malaiya, Yashwant, committee member; Vijayasarathy, Leo, committee memberWith the advancements in contemporary multi-core CPU architectures and increase in main memory capacity, it is now possible for a server operating system (OS), such as Linux, to handle a large number of concurrent services on a single server instance. Individual components of such services may run in different isolated runtime environments, such as chrooted jails or related forms of OS-level containers, and may need restricted access to system resources and the ability to share data and coordinate with each other in a regulated and secure manner. In this dissertation we describe our work on the access control framework for policy formulation, management, and enforcement that allows access to OS resources and also permits controlled data sharing and coordination for service components running in disjoint containerized environments within a single Linux OS server instance. The framework consists of two models and the policy formulation is based on the concept of policy classes for ease of administration and enforcement. The policy classes are managed and enforced through a Lightweight Policy Machine for Linux (LPM) that acts as the centralized reference monitor and provides a uniform interface for regulating access to system resources and requesting data and control objects. We present the details of our framework and also discuss the preliminary implementation and evaluation to demonstrate the feasibility of our approach.Item Open Access Quantifying the security risk of discovering and exploiting software vulnerabilities(Colorado State University. Libraries, 2016) Mussa, Awad A. Younis, author; Malaiya, Yashwant, advisor; Ray, Indrajit, committee member; Anderson, Charles W., committee member; Vijayasarathy, Leo, committee memberMost of the attacks on computer systems and networks are enabled by vulnerabilities in a software. Assessing the security risk associated with those vulnerabilities is important. Risk models such as the Common Vulnerability Scoring System (CVSS), Open Web Application Security Project (OWASP) and Common Weakness Scoring System (CWSS) have been used to qualitatively assess the security risk presented by a vulnerability. CVSS metrics are the de facto standard and its metrics need to be independently evaluated. In this dissertation, we propose using a quantitative approach that uses an actual data, mathematical and statistical modeling, data analysis, and measurement. We have introduced a novel vulnerability discovery model, Folded model, that estimates the risk of vulnerability discovery based on the number of residual vulnerabilities in a given software. In addition to estimating the risk of vulnerabilities discovery of a whole system, this dissertation has furthermore introduced a novel metrics termed time to vulnerability discovery to assess the risk of an individual vulnerability discovery. We also have proposed a novel vulnerability exploitability risk measure termed Structural Severity. It is based on software properties, namely attack entry points, vulnerability location, the presence of the dangerous system calls, and reachability analysis. In addition to measurement, this dissertation has also proposed predicting vulnerability exploitability risk using internal software metrics. We have also proposed two approaches for evaluating CVSS Base metrics. Using the availability of exploits, we first have evaluated the performance of the CVSS Exploitability factor and have compared its performance to Microsoft (MS) rating system. The results showed that exploitability metrics of CVSS and MS have a high false positive rate. This finding has motivated us to conduct further investigation. To that end, we have introduced vulnerability reward programs (VRPs) as a novel ground truth to evaluate the CVSS Base scores. The results show that the notable lack of exploits for high severity vulnerabilities may be the result of prioritized fixing of vulnerabilities.Item Open Access Quantitative analyses of software vulnerabilities(Colorado State University. Libraries, 2011) Joh, HyunChul, author; Malaiya, Yashwant K., advisor; Ray, Indrajit, committee member; Ray, Indrakshi, committee member; Jayasumana, Anura P., committee memberThere have been numerous studies addressing computer security and software vulnerability management. Most of the time, they have taken a qualitative perspective. In many other disciplines, quantitative analyses have been indispensable for performance assessment, metric measurement, functional evaluation, or statistical modeling. Quantitative approaches can also help to improve software risk management by providing guidelines obtained by using actual data-driven analyses for optimal allocations of resources for security testing, scheduling, and development of security patches. Quantitative methods allow objective and more accurate estimates of future trends than qualitative manners only because a quantitative approach uses real datasets with statistical methods which have proven to be a very powerful prediction approach in several research fields. A quantitative methodology makes it possible for end-users to assess the risks posed by vulnerabilities in software systems, and potential breaches without getting burdened by details of every individual vulnerability. At the moment, quantitative risk analysis in information security systems is still in its infancy stage. However, recently, researchers have started to explore various software vulnerability related attributes quantitatively as the vulnerability datasets have now become large enough for statistical analyses. In this dissertation, quantitative analysis is presented dealing with i) modeling vulnerability discovery processes in major Web servers and browsers, ii) relationship between the performance of S-shaped vulnerability discovery models and the skew in vulnerability datasets examined, iii) linear vulnerability discovery trends in multi-version software systems, iv) periodic behavior in weekly exploitation and patching of vulnerabilities as well as long term vulnerability discovery process, and v) software security risk evaluation with respect to the vulnerability lifecycle and CVSS. Results show good superior vulnerability discovery model fittings and reasonable prediction capabilities for both time-based and effort-based models for datasets from Web servers and browsers. Results also show that AML and Gamma distribution based models perform better than other S-shaped models with skewed left and right datasets respectively. We find that code sharing among the successive versions cause a linear discovery pattern. We establish that there are indeed long and short term periodic patterns in software vulnerability related activities which have been only vaguely recognized by the security researchers. Lastly, a framework for software security risk assessment is proposed which can allow a comparison of software systems in terms of the risk and potential approaches for optimization of remediation.