Browsing by Author "Vijayasarathy, Leo, committee member"
Now showing 1 - 14 of 14
Results Per Page
Sort Options
Item Open Access Automatic detection of constraints in software documentation(Colorado State University. Libraries, 2021) Ghosh, Joy, author; Moreno Cubillos, Laura, advisor; Ghosh, Sudipto, committee member; Vijayasarathy, Leo, committee memberSoftware documentation is an important resource when maintaining and evolving software, as it supports developers in program understanding. To keep it up to date, developers need to verify that all the constraints affected by a change in source code are consistently described in the documentation. The process of detecting all the constraints in the documentation and cross-checking the constraints in the source code is time-consuming. An approach capable of automatically identifying software constraints in documentation could facilitate the process of detecting constraints, which are necessary to cross-check documentation and source code. In this thesis, we explore different machine learning algorithms to build binary classification models that assign sentences extracted from software documentation to one of two categories: constraints and non-constraints. The models are trained on a data set that consists of 368 manually tagged sentences from four open-source software systems. We evaluate the performance of the different models (Decision tree, Naive Bayes, Support Vector Machine, Fine-tuned BERT) based on precision, recall and F1-score. Our best model (i.e., a decision tree featuring bigrams) was able to achieve 74.0% precision, 83.8% recall and an F1-score of 0.79. This suggests that our results are promising and that it is possible to build machine learning based models for the automatic detection of constraints in the software documentation.Item Open Access CPS security testbed: requirement analysis, prototype design and protection framework(Colorado State University. Libraries, 2023) Talukder, Md Rakibul Hasan, author; Ray, Indrajit, advisor; Malaiya, Yashwant, committee member; Vijayasarathy, Leo, committee memberTestbeds are a practical way to perform security exercises on cyber physical systems (CPS) to understand vulnerabilities and the progression/impact of cyber-attacks. However, it is challenging to replicate a large CPS, such as nuclear power plant or an electrical power grid, within the confines of a laboratory that would allow security experiments to be carried out. Thus, software-based simulations are getting increasingly popular as opposed to hardware-in-the-loop based simulations for CPS that form a critical infrastructure. Unfortunately, a software-based CPS testbed oriented towards security-centric experiments requires a careful re-examination of requirements and architectural design different from a CPS testbed for non-security related experiments. On a security-focused testbed there is a need to run real attack scripts for red-teaming/blue-teaming exercises, which are, in the strictest sense of the term, malicious in nature. Thus, there is a need to protect the testbed itself from these attack experiments that have the potential to go awry. The overall effect of an exploit on the whole system or vulnerabilities at communication channels needs to be particularly explored while building a simulator for a security-centric CPS. Besides, when multiple experiments are conducted on the same testbed, there is a need to maintain isolation among these experiments so that no experiment can accidentally or maliciously compromise others and affect the fidelity of those results. Specific security experiment-related supports are essential when designing such a testbed but integrating a software-based simulator within the testbed to provide necessary experiment support is challenging. In this thesis, we make three contributions. First, we present the design of an ideal testbed based on a set of requirements and supports that we have identified, focusing specifically on security experiment as the primary use case. Next, following these requirements analysis, we integrate a software-based simulator (Generic Pressurized Water Reactor) into a testbed design by modifying the implementation architecture to allow the execution of attack experiments on different networking architectures and protocols. Finally, we describe a novel security architecture and framework to ensure the protection of security-related experiments on a CPS testbed.Item Open Access Integrating MBSAP with continuous improvement for developing resilient healthcare systems(Colorado State University. Libraries, 2021) Speece, Jill E., author; Eftekhari Shahroudi, Kamran, advisor; Herber, Dan, committee member; Borky, Mike, committee member; Vijayasarathy, Leo, committee memberThe high cost of healthcare is a well-known topic. Utilizing systems engineering methods to address the problem is less well-known in the healthcare industry. There are many variables that impact the cost of healthcare, and this dissertation proposes a solution for the systemic problem of same day missed appointments. Healthcare systems have had success using Continuous Improvement (CI) tools and methods to change and improve processes, but the use of CI tools alone has not yet produced a sustained solution for same day missed appointments. Robust healthcare systems are driven by the architecture. Through utilization of the Model-Based Systems Architecture Process (MBSAP), an architecture was developed to automate utilization management and ultimately reduce the impact of same day missed appointments. During the needs analysis phase of system development, the history of the problem at an outpatient imaging center was studied and initial experiments for system feasibility were performed. It was found that elements of the architecture are feasible but needed to be more fully developed before implementation. Benchmarking against other service-oriented industries provided additional context for the problem and a set of alternatives for subsystems within the architecture. These two efforts also resulted in the overarching system objective to create a solution that does not rely on changing patient behavior. Since the outpatient imaging center is a sociotechnical system, four social dimensions – the customer dimension, the planning dimension, the operations dimension, and the technical dimension – were defined and analyzed to find the right balance between alternative architectures for the diverse set of stakeholders needs. A subdomain that included the creation of a master dataset, a visual dashboard, and a predictive model was fully developed by integrating CI methodologies with MBSAP. The proposed architecture includes automating the integration of the results of the predictive model with existing systems, but this piece of the architecture is still under development. In manually simulating how the results would change internal workflows to provide proactive targeted interventions, a 17% improvement ($260k) in the annual cost (~$1.5M) of same day missed appointments for the outpatient imaging center was realized. MBSAP has been invaluable in adding systemic and systematic rigor to the complex real-world problem of same day missed appointments in an outpatient imaging center. The resulting systems architecture ensures that the needs of all stakeholders are met while anticipating potential unintended consequences.Item Open Access Modeling and querying uncertain data for activity recognition systems using PostgreSQL(Colorado State University. Libraries, 2012) Burnett, Kevin, author; Draper, Bruce, advisor; Ray, Indrakshi, advisor; Vijayasarathy, Leo, committee memberActivity Recognition (AR) systems interpret events in video streams by identifying actions and objects and combining these descriptors into events. Relational databases can be used to model AR systems by describing the entities and relationships between entities. This thesis presents a relational data model for storing the actions and objects extracted from video streams. Since AR is a sequential labeling task, where a system labels images from video streams, errors will be produced because the interpretation process is not always temporally consistent with the world. This thesis proposes a PostgreSQL function that uses the Viterbi algorithm to temporally smooth labels over sequences of images and to identify track windows, or sequential images that share the same actions and objects. The experiment design tests the effects that the number of sequential images, label count, and data size has on execution time for identifying track windows. The results from these experiments show that label count is the dominant factor in the execution time.Item Open Access Networks and trust: systems for understanding and supporting internet security(Colorado State University. Libraries, 2022) Boots, Bryan Charles, author; Simske, Steve J., advisor; Abdunabi, Ramadan, committee member; Jayasumana, Anura, committee member; Vijayasarathy, Leo, committee memberThis dissertation takes a systems-level view of the multitude of existing trust management systems to make sense of when, where and how (or, in some cases, if) each is best utilized. Trust is a belief by one person that by transacting with another person (or organization) within a specific context, a positive outcome will result. Trust serves as a heuristic that enables us to simplify the dozens decisions we make each day about whom we will transact with. In today's hyperconnected world, in which for many people a bulk of their daily transactions related to business, entertainment, news, and even critical services like healthcare take place online, we tend to rely even more on heuristics like trust to help us simplify complex decisions. Thus, trust plays a critical role in online transactions. For this reason, over the past several decades researchers have developed a plethora of trust metrics and trust management systems for use in online systems. These systems have been most frequently applied to improve recommender systems and reputation systems. They have been designed for and applied to varied online systems including peer-to-peer (P2P) filesharing networks, e-commerce platforms, online social networks, messaging and communication networks, sensor networks, distributed computing networks, and others. However, comparatively little research has examined the effects on individuals, organizations or society of the presence or absence of trust in online sociotechnical systems. Using these existing trust metrics and trust management systems, we design a set of experiments to benchmark the performance of these existing systems, which rely heavily on network analysis methods. Drawing on the experiments' results, we propose a heuristic decision-making framework for selecting a trust management system for use in online systems. In this dissertation we also investigate several related but distinct aspects of trust in online sociotechnical systems. Using network/graph analysis methods, we examine how trust (or lack of trust) affects the performance of online networks in terms of security and quality of service. We explore the structure and behavior of online networks including Twitter, GitHub, and Reddit through the lens of trust. We find that higher levels of trust within a network are associated with more spread of misinformation (a form of cybersecurity threat, according to the US CISA) on Twitter. We also find that higher levels of trust in open source developer networks on GitHub are associated with more frequent incidences of cybersecurity vulnerabilities. Using our experimental and empirical findings previously described, we apply the Systems Engineering Process to design and prototype a trust management tool for use on Reddit, which we dub Coni the Trust Moderating Bot. Coni is, to the best of our knowledge, the first trust management tool designed specifically for use on the Reddit platform. Through our work with Coni, we develop and present a blueprint for constructing a Reddit trust tool which not only measures trust levels, but can use these trust levels to take actions on Reddit to improve the quality of submissions within the community (a subreddit).Item Open Access On component-oriented access control in lightweight virtualized server environments(Colorado State University. Libraries, 2017) Belyaev, Kirill, author; Ray, Indrakshi, advisor; Ray, Indrajit, committee member; Malaiya, Yashwant, committee member; Vijayasarathy, Leo, committee memberWith the advancements in contemporary multi-core CPU architectures and increase in main memory capacity, it is now possible for a server operating system (OS), such as Linux, to handle a large number of concurrent services on a single server instance. Individual components of such services may run in different isolated runtime environments, such as chrooted jails or related forms of OS-level containers, and may need restricted access to system resources and the ability to share data and coordinate with each other in a regulated and secure manner. In this dissertation we describe our work on the access control framework for policy formulation, management, and enforcement that allows access to OS resources and also permits controlled data sharing and coordination for service components running in disjoint containerized environments within a single Linux OS server instance. The framework consists of two models and the policy formulation is based on the concept of policy classes for ease of administration and enforcement. The policy classes are managed and enforced through a Lightweight Policy Machine for Linux (LPM) that acts as the centralized reference monitor and provides a uniform interface for regulating access to system resources and requesting data and control objects. We present the details of our framework and also discuss the preliminary implementation and evaluation to demonstrate the feasibility of our approach.Item Open Access On designing large, secure and resilient networked systems(Colorado State University. Libraries, 2019) Mulamba Kadimbadimba, Dieudonné, author; Ray, Indrajit, advisor; Ray, Indrakshi, committee member; McConnell, Ross, committee member; Vijayasarathy, Leo, committee memberDefending large networked systems against rapidly evolving cyber attacks is challenging. This is because of several factors. First, cyber defenders are always fighting an asymmetric warfare: While the attacker needs to find just a single security vulnerability that is unprotected to launch an attack, the defender needs to identify and protect against all possible avenues of attacks to the system. Various types of cost factors, such as, but not limited to, costs related to identifying and installing defenses, costs related to security management, costs related to manpower training and development, costs related to system availability, etc., make this asymmetric warfare even challenging. Second, newer and newer cyber threats are always emerging - the so called zero-day attacks. It is not possible for a cyber defender to defend against an attack for which defenses are yet unknown. In this work, we investigate the problem of designing large and complex networks that are secure and resilient. There are two specific aspects of the problem that we look into. First is the problem of detecting anomalous activities in the network. While this problem has been variously investigated, we address the problem differently. We posit that anomalous activities are the result of mal-actors interacting with non mal-actors, and such anomalous activities are reflected in changes to the topological structure (in a mathematical sense) of the network. We formulate this problem as that of Sybil detection in networks. For our experimentation and hypothesis testing we instantiate the problem as that of Sybil detection in on-line social networks (OSNs). Sybil attacks involve one or more attackers creating and introducing several mal-actors (fake identities in on-line social networks), called Sybils, into a complex network. Depending on the nature of the network system, the goal of the mal-actors can be to unlawfully access data, to forge another user's identity and activity, or to influence and disrupt the normal behavior of the system. The second aspect that we look into is that of building resiliency in a large network that consists of several machines that collectively provide a single service to the outside world. Such networks are particularly vulnerable to Sybil attacks. While our Sybil detection algorithms achieve very high levels of accuracy, they cannot guarantee that all Sybils will be detected. Thus, to protect against such "residual" Sybils (that is, those that remain potentially undetected and continue to attack the network services), we propose a novel Moving Target Defense (MTD) paradigm to build resilient networks. The core idea is that for large enterprise level networks, the survivability of the network's mission is more important than the security of one or more of the servers. We develop protocols to re-locate services from server to server in a random way such that before an attacker has an opportunity to target a specific server and disrupt it’s services, the services will migrate to another non-malicious server. The continuity of the service of the large network is thus sustained. We evaluate the effectiveness of our proposed protocols using theoretical analysis, simulations, and experimentation. For the Sybil detection problem we use both synthetic and real-world data sets. We evaluate the algorithms for accuracy of Sybil detection. For the moving target defense protocols we implement a proof-of-concept in the context of access control as a service, and run several large scale simulations. The proof-of- concept demonstrates the effectiveness of the MTD paradigm. We evaluate the computation and communication complexity of the protocols as we scale up to larger and larger networks.Item Open Access On the design of a secure and anonymous publish-subscribe system(Colorado State University. Libraries, 2012) Mulamba Kadimbadimba, Dieudonne, author; Ray, Indrajit, advisor; Ray, Indrakshi, committee member; Vijayasarathy, Leo, committee memberThe reliability and the high availability of data have made online servers very popular among single users or organizations like hospitals, insurance companies or administrations. This has led to an increased dissemination of personal data on public servers. These online companies are increasingly adopting the publish-subscribe as a new model for storing and managing data on a distributed network. While bringing some real improvement in the way these online companies store and manage data in a dynamic and distributed environment, publish-subscribe is also bringing some new challenges of security and privacy. The centralization of personal data on public servers has raised citizens' concerns about their privacy. Several security breaches involving the leakage of personal data have occured, showing us how crucial this issue has become. A significant amount of work has been done in the field of securing the publish-subscribe. However, all of this research assumes that the server is a trusted entity, an assumption which ignores the fact that this server can be honest but curious. This leads to the need to develop a means to protect publishers and subscribers from server curiosity. One solution to this problem could be to anonymize all communications involving publishers and subscribers. This solution will raise in turn another issue which involves how to allow a subscriber to query a file that was anonymously uploaded in the server by the publisher. In this work, we propose an implementation of a communication protocol that allows users to asynchronously and anonymously exchange messages and that also supports a secure deletion of messages.Item Open Access Preservation of low latency service request processing in dockerized microservice architectures(Colorado State University. Libraries, 2016) Sudalaikkan, Leo Vigneshwaran, author; Pallickara, Shrideep, advisor; Pallickara, Sangmi Lee, committee member; Vijayasarathy, Leo, committee memberOrganizations are increasingly transitioning from monolithic architectures to microservices based architectures. Software built as microservices can be broken into multiple components that are easily deployable and scalable while providing good utilization of resources. A popular approach to building microservices is through containers. Docker is an open source technology for building, deploying, and executing distributed applications within containers that are referred to as pods in Docker orchestrator terminology. The objective of this thesis is the dynamic and targeted scaling of the pods comprising an application to ensure low latency servicing of requests. Our methodology targets the identication of impending latency constraint violations and performs targeted scaling maneuvers to alleviate load at a particular pod. Empirical benchmarks demonstrate the suitability of our approach.Item Open Access Privacy preserving linkage and sharing of sensitive data(Colorado State University. Libraries, 2018) Lazrig, Ibrahim Meftah, author; Ray, Indrakshi, advisor; Ray, Indrajit, advisor; Malaiya, Yashwant, committee member; Vijayasarathy, Leo, committee member; Ong, Toan, committee memberSensitive data, such as personal and business information, is collected by many service providers nowadays. This data is considered as a rich source of information for research purposes that could benet individuals, researchers and service providers. However, because of the sensitivity of such data, privacy concerns, legislations, and con ict of interests, data holders are reluctant to share their data with others. Data holders typically lter out or obliterate privacy related sensitive information from their data before sharing it, which limits the utility of this data and aects the accuracy of research. Such practice will protect individuals' privacy; however it prevents researchers from linking records belonging to the same individual across dierent sources. This is commonly referred to as record linkage problem by the healthcare industry. In this dissertation, our main focus is on designing and implementing ecient privacy preserving methods that will encourage sensitive information sources to share their data with researchers without compromising the privacy of the clients or aecting the quality of the research data. The proposed solution should be scalable and ecient for real-world deploy- ments and provide good privacy assurance. While this problem has been investigated before, most of the proposed solutions were either considered as partial solutions, not accurate, or impractical, and therefore subject to further improvements. We have identied several issues and limitations in the state of the art solutions and provided a number of contributions that improve upon existing solutions. Our rst contribution is the design of privacy preserving record linkage protocol using semi-trusted third party. The protocol allows a set of data publishers (data holders) who compete with each other, to share sensitive information with subscribers (researchers) while preserving the privacy of their clients and without sharing encryption keys. Our second contribution is the design and implementation of a probabilistic privacy preserving record linkage protocol, that accommodates discrepancies and errors in the data such as typos. This work builds upon the previous work by linking the records that are similar, where the similarity range is formally dened. Our third contribution is a protocol that performs information integration and sharing without third party services. We use garbled circuits secure computation to design and build a system to perform the record linkages between two parties without sharing their data. Our design uses Bloom lters as inputs to the garbled circuits and performs a probabilistic record linkage using the Dice coecient similarity measure. As garbled circuits are known for their expensive computations, we propose new approaches that reduce the computation overhead needed, to achieve a given level of privacy. We built a scalable record linkage system using garbled circuits, that could be deployed in a distributed computation environment like the cloud, and evaluated its security and performance. One of the performance issues for linking large datasets is the amount of secure computation to compare every pair of records across the linked datasets to nd all possible record matches. To reduce the amount of computations a method, known as blocking, is used to lter out as much as possible of the record pairs that will not match, and limit the comparison to a subset of the record pairs (called can- didate pairs) that possibly match. Most of the current blocking methods either require the parties to share blocking keys (called blocks identiers), extracted from the domain of some record attributes (termed blocking variables), or share reference data points to group their records around these points using some similarity measures. Though these methods reduce the computation substantially, they leak too much information about the records within each block. Toward this end, we proposed a novel privacy preserving approximate blocking scheme that allows parties to generate the list of candidate pairs with high accuracy, while protecting the privacy of the records in each block. Our scheme is congurable such that the level of performance and accuracy could be achieved according to the required level of privacy. We analyzed the accuracy and privacy of our scheme, implemented a prototype of the scheme, and experimentally evaluated its accuracy and performance against dierent levels of privacy.Item Open Access Quantifying the security risk of discovering and exploiting software vulnerabilities(Colorado State University. Libraries, 2016) Mussa, Awad A. Younis, author; Malaiya, Yashwant, advisor; Ray, Indrajit, committee member; Anderson, Charles W., committee member; Vijayasarathy, Leo, committee memberMost of the attacks on computer systems and networks are enabled by vulnerabilities in a software. Assessing the security risk associated with those vulnerabilities is important. Risk models such as the Common Vulnerability Scoring System (CVSS), Open Web Application Security Project (OWASP) and Common Weakness Scoring System (CWSS) have been used to qualitatively assess the security risk presented by a vulnerability. CVSS metrics are the de facto standard and its metrics need to be independently evaluated. In this dissertation, we propose using a quantitative approach that uses an actual data, mathematical and statistical modeling, data analysis, and measurement. We have introduced a novel vulnerability discovery model, Folded model, that estimates the risk of vulnerability discovery based on the number of residual vulnerabilities in a given software. In addition to estimating the risk of vulnerabilities discovery of a whole system, this dissertation has furthermore introduced a novel metrics termed time to vulnerability discovery to assess the risk of an individual vulnerability discovery. We also have proposed a novel vulnerability exploitability risk measure termed Structural Severity. It is based on software properties, namely attack entry points, vulnerability location, the presence of the dangerous system calls, and reachability analysis. In addition to measurement, this dissertation has also proposed predicting vulnerability exploitability risk using internal software metrics. We have also proposed two approaches for evaluating CVSS Base metrics. Using the availability of exploits, we first have evaluated the performance of the CVSS Exploitability factor and have compared its performance to Microsoft (MS) rating system. The results showed that exploitability metrics of CVSS and MS have a high false positive rate. This finding has motivated us to conduct further investigation. To that end, we have introduced vulnerability reward programs (VRPs) as a novel ground truth to evaluate the CVSS Base scores. The results show that the notable lack of exploits for high severity vulnerabilities may be the result of prioritized fixing of vulnerabilities.Item Open Access Testing with state variable data-flow criteria for aspect-oriented programs(Colorado State University. Libraries, 2011) Wedyan, Fadi, author; Ghosh, Sudipto, advisor; Bieman, James M., committee member; Malaiya, Yashwant K., committee member; Vijayasarathy, Leo, committee memberData-flow testing approaches have been used for procedural and object-oriented (OO) programs, and empirically shown to be effective in detecting faults. However, few such approaches have been proposed for aspect-oriented (AO) programs. In an AO program, data-flow interactions can occur between the base classes and aspects, which can affect the behavior of both. Faults resulting from such interactions are hard to detect unless the interactions are specifically targeted during testing. In this research, we propose a data-flow testing approach for AO programs. In an AO program, an aspect and a base class interact either through parameters passed from advised methods in the base class to the advice, or by the direct reading and writing of the base class state variables in the advice. We identify a group of def-use associations (DUAs) that are based on the base class state variables and propose a set of data-flow test criteria that require executing these DUAs. We identify fault types that result from incorrect data-flow interactions in AO programs and extend an existing AO fault model to include these faults. We implemented our approach in a tool that identifies the targeted DUAs by the proposed criteria, runs a test suite, and computes the coverage results. We conducted an empirical study that compares the cost and effectiveness of the proposed criteria with two control-flow criteria. The empirical study is performed using four subject programs. We seeded faults in the programs using three mutation tools, AjMutator, Proteum/AJ, and μJava. We used a test generation tool, called RANDOOP, to generate a pool of random test cases. To produce a test suite that satisfies a criterion, we randomly selected test cases from the test pool until required coverage for a criterion is reached. We evaluated three dimensions of the cost of a test criterion. The first dimension is the size of a test suite that satisfies a test criterion, which we measured by the number of test cases in the test suite. The second cost dimension is the density of a test case which we measured by the number of test cases in the test suite divided by the number of test requirements. The third cost dimension is the time needed to randomly obtain a test suite that satisfies a criterion, which we measured by (1) the number of iterations required by the test suites generator for randomly selecting test cases from a pool of test cases until a test criterion is satisfied, and (2) the number of the iterations per test requirement. Effectiveness is measured by the mutation scores of the test suites that satisfy a criterion. We evaluated effectiveness for all faults and for each fault type. Our results show that the test suites that cover all the DUAs of state variables are more effective in revealing faults than the control-flow criteria. However, they cost more in terms of test suite size and effort. The results also show that the test suites that cover state variable DUAs in advised classes are suitable for detecting most of the fault types in the revised AO fault model. Finally, we evaluated the cost-effectiveness of the test suites that cover all state variables DUAs for three coverage levels: 100%, 90%, and 80%. The results show that the test suites that cover 90% of the state variables DUAs are the most cost-effective.Item Open Access Towards efficient implementation of attribute-based access control(Colorado State University. Libraries, 2021) Pagadala, Vignesh M., author; Ray, Indrakshi, advisor; Ray, Indrajit, committee member; Anderson, Charles, committee member; Vijayasarathy, Leo, committee memberAttribute-Based Access Control (ABAC) is a methodology which allows or prohibits a subject (user or process) from performing actions on an object (resource), based upon the attributes of the subject and the object. The inherent versatility of ABAC, as opposed to other access control methods such as Role-Based Access Control (RBAC), has ensured the availability of a wide range of use-cases for applying the same, including but not limited to, healthcare, finance, government and military. Of late, more and more organizations are settling for ABAC as their choice of access control scheme. In order to implement ABAC, standards such as the eXtensible Access Control Markup Language (XACML) and Next-Generation Access Control (NGAC) are typically employed. Though these standards allow organizations to implement an access control scheme which is fine-grained, easily manageable and devoid of problems such as role explosions, certain bottlenecks still exist in terms of the time taken to evaluate access requests, and pre-computations being performed to prepare the mechanism for answering queries. These issues become apparent only when the number of entities involved in the organization (subjects and objects) begin to scale. Previous works based on NGAC have been proposed, which manage to ensure efficient evaluation of access requests. However, the procedures outline the need to perform pre-computations, whose time complexity scales rapidly with respect to growing number of entities and policies. We argue that this implementation can be done better, by dexterous use of specific data-structures. Our ABAC implementation (using NGAC) not only answers queries in O(1), but also quickens the pre-computation process to practicable levels, thereby making this more suitable for implementation. We also propose secondary contributions - a mechanism to respond to access requests while a policy update is underway, and procedures to enforce policies from a subset of several policy classes.Item Open Access Towards model-based regression test selection(Colorado State University. Libraries, 2019) Al-Refai, Mohammed, author; Ghosh, Sudipto, advisor; Cazzola, Walter, advisor; Bieman, James M., committee member; Ray, Indrakshi, committee member; Vijayasarathy, Leo, committee memberModern software development processes often use UML models to plan and manage the evolution of software systems. Regression testing is important to ensure that the evolution or adaptation did not break existing functionality. Regression testing can be expensive and is performed with limited resources and under time constraints. Regression test selection (RTS) approaches are used to reduce the cost. RTS is performed by analyzing the changes made to a system at the code or model level. Existing model-based RTS approaches that use UML models have some limitations. They do not take into account the impact of changes to the inheritance hierarchy of the classes on test case selection. They use behavioral models to perform impact analysis and obtain traceability links between model elements and test cases. However, in practice, structural models such as class diagrams are most commonly used for designing and maintaining applications. Behavioral models are rarely used and even when they are used, they tend to be incomplete and lack fine-grained details needed to obtain the traceability links, which limits the applicability of the existing UML-based RTS approaches. The goal of this dissertation is to address these limitations and improve the applicability of model-based RTS in practice. To achieve this goal, we proposed a new model-based RTS approach called FLiRTS 2. The development of FLiRTS 2 was driven by our experience accrued from two model-based RTS approaches. The first approach is called MaRTS, which we proposed to incorporate the information related to inheritance hierarchy changes for test case selection. MaRTS is based on UML class and activity diagrams that represent the fine-grained behaviors of a software system and its test cases. The second approach is called FLiRTS, which we proposed to investigate the use of fuzzy logic to enable RTS based on UML sequence and activity diagrams. The activity diagrams lack fine-grained details needed to obtain the traceability links between models and test cases. MaRTS exploits reverse engineering tools to generate complete, fine-grained diagrams from source code. FLiRTS is based on refining a provided set of abstract activity diagrams to generate fine-grained activity diagrams. We learned from our experience with MaRTS that performing static analysis on class diagrams enables the identification of test cases that are impacted by changes made to the inheritance hierarchy. Our experience with FLiRTS showed that fuzzy logic can be used to address the uncertainty introduced in the traceability links because of the use of refinements of abstract models. However, it became evident that the applicability of MaRTS and FLiRTS is limited because the process that generates complete behavioral diagrams is expensive, does not scale up to real world projects, and may not always be feasible due to the heterogeneity, complexity, and size of software applications. Therefore, we proposed FLiRTS 2, which extends FLiRTS by dropping the need for using behavioral diagrams and instead relying only on the presence of UML class diagrams. In the absence of behavioral diagrams, fuzzy logic addresses the uncertainty in determining which classes and relationships in the class diagram are actually exercised by the test cases. The generalization and realization relationships in the class diagram are used to identify test cases that are impacted by the changes made to the inheritance hierarchy. We conducted a large evaluation of FLiRTS 2 and compared its safety, precision, reduction in test suite size, and the fault detection ability of the reduced test suites with that of two code-based RTS approaches that represent the state-of-art for dynamic and static RTS. The results of our empirical studies showed that FLiRTS 2 achieved high safety and reduction in test suite size. The fault detection ability of the reduced test suites was comparable to that achieved by the full test suites. FLiRTS 2 is applicable to a wide range of systems of varying domains and sizes.