Browsing by Author "Ghosh, Sudipto, advisor"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
Item Open Access A systematic approach to testing UML designs(Colorado State University. Libraries, 2007) Dinh-Trong, Trung T., author; France, Robert B., advisor; Ghosh, Sudipto, advisorIn Model Driven Engineering (MDE) approaches, developers create and refine design models from which substantial portions of implementations are generated. During refinement, undetected faults in an abstract model can traverse into the refined models, and eventually into code. Hence, finding and removing faults in design models is essential for MDE approaches to succeed. This dissertation describes a testing approach to finding faults in design models created using the Unified Modeling Language (UML). Executable forms of UML design models are exercised using generated test inputs that provide coverage with respect to UML-based coverage criteria. The UML designs that are tested consist of class diagrams, sequence diagrams and activity diagrams. The contribution of the dissertation includes (1) a test input generation technique, (2) an approach to execute design models describing sequential behavior with test inputs in order to detect faults, and (3) a set of pilot studies that are carried out to explore the fault detection capability of our testing approach. The test input generation technique involves analyzing design models under test to produce test inputs that satisfy UML sequence diagram coverage criteria. We defined a directed graph structure, named Variable Assignment Graph (VAG), to generate test inputs. The VAG combines information from class and sequence diagrams. Paths are selected from the VAG and constraints are identified to traverse the paths. The constraints are then solved with a constraint solver. The model execution technique involves transforming each design under test into an executable form, which is exercised with the generated inputs. Failures are reported if the observed behavior differs from the expected behavior. We proposed an action language, named Java-like Action Language (JAL), that supports the UML action semantics. We developed a prototype tool, named UMLAnT, that performs test execution and animation of design models. We performed pilot studies to evaluate the fault detection effectiveness of our approach. Mutation faults and commonly occurring faults in UML models created by students in our software engineering courses were seeded in three design models. Ninety percent of the seeded faults were detected using our approach.Item Open Access A systematic approach to testing UML designs(Colorado State University. Libraries, 2006) Dinh-Trong, Trung T., author; France, Robert B., advisor; Ghosh, Sudipto, advisor; Bieman, James M., committee member; Malaiya, Yashwant K., committee member; Fan, Chuen-mei, committee memberIn Model Driven Engineering (MDE) approaches, developers create and refine design models from which substantial portions of implementations are generated. During refinement, undetected faults in abstract model can traverse into the refined model, and eventually into code. Hence, finding and removing faults in design models is essential for MDE approaches to succeed. This dissertation describes approach to finding faults in design models created using the Unified Modeling Language (UML). Executable forms of UML design models are exercised using generated test inputs that provide coverage with respect to UML-based coverage criteria. The UML designs that are tested consist of class diagrams, sequence diagrams and activity diagrams. The contribution of the dissertation includes (1) a test input generation technique, (2) an approach to execute design models describing sequential behavior with test inputs in order to detect faults, and (3) a set of pilot studies that are carried out to explore the fault detection capability of our testing approach. The test input generation technique involves analyzing design models under test to produce test inputs that satisfy UML sequence diagram coverage criteria. We defined a directed graph structure, named Variable Assignment Graph (VAG), to generate test inputs. The VAG combines information from class and sequence diagrams. Paths are selected from the VAG and constraints are identified to traverse the paths. The constraints are then solved with a constraint solver. The model execution technique involves transforming each design under test into an executable from, which is exercised with the general inputs. Failures are reported if the observed behavior differs from the expected behavior. We proposed an action language, named Java-like Action Language (JAL), that supports the UML action semantics. We developed a prototype tool, named UMLAnT, that performs test execution and animation of design models. We performed pilot studies to evaluate the fault detection effectiveness of our approach. Mutation faults and commonly occurring faults in UML models created by students in our software engineering courses were seeded in three design models. Ninety percent of the seeded faults were detected using our approach.Item Open Access An approach for testing the extract-transform-load process in data warehouse systems(Colorado State University. Libraries, 2018) Homayouni, Hajar, author; Ghosh, Sudipto, advisor; Ray, Indrakshi, advisor; Bieman, James M., committee member; Vijayasarathy, Leo R., committee memberEnterprises use data warehouses to accumulate data from multiple sources for data analysis and research. Since organizational decisions are often made based on the data stored in a data warehouse, all its components must be rigorously tested. In this thesis, we first present a comprehensive survey of data warehouse testing approaches, and then develop and evaluate an automated testing approach for validating the Extract-Transform-Load (ETL) process, which is a common activity in data warehousing. In the survey we present a classification framework that categorizes the testing and evaluation activities applied to the different components of data warehouses. These approaches include both dynamic analysis as well as static evaluation and manual inspections. The classification framework uses information related to what is tested in terms of the data warehouse component that is validated, and how it is tested in terms of various types of testing and evaluation approaches. We discuss the specific challenges and open problems for each component and propose research directions. The ETL process involves extracting data from source databases, transforming it into a form suitable for research and analysis, and loading it into a data warehouse. ETL processes can use complex one-to-one, many-to-one, and many-to-many transformations involving sources and targets that use different schemas, databases, and technologies. Since faulty implementations in any of the ETL steps can result in incorrect information in the target data warehouse, ETL processes must be thoroughly validated. In this thesis, we propose automated balancing tests that check for discrepancies between the data in the source databases and that in the target warehouse. Balancing tests ensure that the data obtained from the source databases is not lost or incorrectly modified by the ETL process. First, we categorize and define a set of properties to be checked in balancing tests. We identify various types of discrepancies that may exist between the source and the target data, and formalize three categories of properties, namely, completeness, consistency, and syntactic validity that must be checked during testing. Next, we automatically identify source-to-target mappings from ETL transformation rules provided in the specifications. We identify one-to-one, many-to-one, and many-to-many mappings for tables, records, and attributes involved in the ETL transformations. We automatically generate test assertions to verify the properties for balancing tests. We use the source-to-target mappings to automatically generate assertions corresponding to each property. The assertions compare the data in the target data warehouse with the corresponding data in the sources to verify the properties. We evaluate our approach on a health data warehouse that uses data sources with different data models running on different platforms. We demonstrate that our approach can find previously undetected real faults in the ETL implementation. We also provide an automatic mutation testing approach to evaluate the fault finding ability of our balancing tests. Using mutation analysis, we demonstrated that our auto-generated assertions can detect faults in the data inside the target data warehouse when faulty ETL scripts execute on mock source data.Item Open Access An empirical comparison of four Java-based regression test selection techniques(Colorado State University. Libraries, 2020) Shin, Min Kyung, author; Ghosh, Sudipto, advisor; Moreno Cubillos, Laura, committee member; Vijayasarathy, Leo R., committee memberRegression testing is crucial to ensure that previously tested functionality is not broken by additions, modifications, and deletions to the program code. Since regression testing is an expensive process, researchers have developed regression test selection (RTS) techniques, which select and execute only those test cases that are impacted by the code changes. In general, an RTS technique has two main activities, which are (1) determining dependencies between the source code and test cases, and (2) identifying the code changes. Different approaches exist in the research literature to compute dependencies statically or dynamically at different levels of granularity. Also, code changes can be identified at different levels of granularity using different techniques. As a result, RTS techniques possess different characteristics related to the amount of reduction in the test suite size, time to select and run the test cases, test selection accuracy, and fault detection ability of the selected subset of test cases. Researchers have empirically evaluated the RTS techniques, but the evaluations were generally conducted using different experimental settings. This thesis compares four recent Java-based RTS techniques, Ekstazi, HyRTS, OpenClover, and STARTS, with respect to the above-mentioned characteristics using multiple revisions from five open source projects. It investigates the relationship between four program features and the performance of RTS techniques: total (program and test suite) size in KLOC, total number of classes, percentage of test classes over the total number of classes, and the percentage of classes that changed between revisions. The results show that STARTS, a static RTS technique, over-estimates dependencies between test cases and program code, and thus, selects more test cases than the dynamic RTS techniques Ekstazi and HyRTS, even though all three identify code changes in the same way. OpenClover identifies code changes differently from Ekstazi, HyRTS, and STARTS, and selects more test cases. STARTS achieved the lowest safety violation with respect to Ekstazi, and HyRTS achieved the lowest precision violation with respect to both STARTS and Ekstazi. Overall, the average fault detection ability of the RTS techniques was 8.75% lower than that of the original test suite. STARTS, Ekstazi, and HyRTS achieved higher test suite size reduction on the projects with over 100 KLOC than those with less than 100 KLOC. OpenClover achieved a higher test suite size reduction in the subjects that had a fewer total number of classes. The time reduction of OpenClover is affected by the combination of the number of source classes and the number of test cases in the subjects. The higher the number of test cases and source classes, the lower the time reduction.Item Open Access Anomaly detection and explanation in big data(Colorado State University. Libraries, 2021) Homayouni, Hajar, author; Ghosh, Sudipto, advisor; Ray, Indrakshi, advisor; Bieman, James M., committee member; Ray, Indrajit, committee member; Vijayasarathy, Leo R., committee memberData quality tests are used to validate the data stored in databases and data warehouses, and to detect violations of syntactic and semantic constraints. Domain experts grapple with the issues related to the capturing of all the important constraints and checking that they are satisfied. The constraints are often identified in an ad hoc manner based on the knowledge of the application domain and the needs of the stakeholders. Constraints can exist over single or multiple attributes as well as records involving time series and sequences. The constraints involving multiple attributes can involve both linear and non-linear relationships among the attributes. We propose ADQuaTe as a data quality test framework that automatically (1) discovers different types of constraints from the data, (2) marks records that violate the constraints as suspicious, and (3) explains the violations. Domain knowledge is required to determine whether or not the suspicious records are actually faulty. The framework can incorporate feedback from domain experts to improve the accuracy of constraint discovery and anomaly detection. We instantiate ADQuaTe in two ways to detect anomalies in non-sequence and sequence data. The first instantiation (ADQuaTe2) uses an unsupervised approach called autoencoder for constraint discovery in non-sequence data. ADQuaTe2 is based on analyzing records in isolation to discover constraints among the attributes. We evaluate the effectiveness of ADQuaTe2 using real-world non-sequence datasets from the human health and plant diagnosis domains. We demonstrate that ADQuaTe2 can discover new constraints that were previously unspecified in existing data quality tests, and can report both previously detected and new faults in the data. We also use non-sequence datasets from the UCI repository to evaluate the improvement in the accuracy of ADQuaTe2 after incorporating ground truth knowledge and retraining the autoencoder model. The second instantiation (IDEAL) uses an unsupervised LSTM-autoencoder for constraint discovery in sequence data. IDEAL analyzes the correlations and dependencies among data records to discover constraints. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy of the approach improves after incorporating ground truth knowledge about the injected faults and retraining the LSTM-Autoencoder model. The novelty of this research lies in the development of a domain-independent framework that effectively and efficiently discovers different types of constraints from the data, detects and explains anomalous data, and minimizes false alarms through an interactive learning process.Item Open Access Constructing subtle higher order mutants from Java and AspectJ programs(Colorado State University. Libraries, 2015) Omar, Elmahdi, author; Ghosh, Sudipto, advisor; Whitley, Darrell, advisor; Bieman, James M., committee member; Turk, Daniel E., committee memberMutation testing is a fault-based testing technique that helps testers measure and improve the fault-detection effectiveness of their test suites. However, a majority of traditional First Order Mutants (FOMs), which are created by making a single syntactic change to the source code, represent trivial faults that are often easily detected (i.e. killed). Research has shown that the majority of real faults not detected during testing are complex faults that cannot be simulated with FOMs because fixing these faults requires making multiple changes to the source code. Higher Order Mutants (HOMs), which are created by making multiple syntactic changes to the source code, can be used to simulate such faults. The majority of HOMs of a given program are killed by any test suite that kills all the FOMs. We refer to HOMs that are not killed as subtle HOMs. They represent cases where single faults interact by masking each other with respect to the given test suite and produce complex faulty behavior that cannot be simulated with FOMs. The fault-detection effectiveness of the given test suite can be improved by adding test cases that detect the faults denoted by subtle HOMs. Because subtle HOMs are rare in the exponentially large space of candidate HOMs, the cost of finding them can be high even for small programs. A brute force approach that evaluates every HOM in the search space by constructing, compiling, and executing the HOM against the given test suite is unrealistic. We developed a set of search techniques for finding subtle HOMs in the context of Java and AspectJ programming languages. We chose Java because of its popularity, and the availability of experimental tools and open source programs. We chose AspectJ because of its unique concepts and constructs and their consequent testing challenges. We developed four search-based software engineering techniques: (1)~Genetic Algorithm, (2)~Local Search, (3)~Test-Case Guided Local Search, (4)~Data-Interaction Guided Local Search. We also developed a Restricted Random Search technique and a Restricted Enumeration Search technique. Each search technique explores the search space in a different way and that affects the type of subtle HOMs that can be found by each technique. Each of the guided local search techniques uses a heuristic to improve the ability of Local Search to find subtle HOMs. Due to the unavailability of higher order mutation testing tools for AspectJ and Java programs, we developed HOMAJ, a Higher Order Mutation Testing tool for AspectJ and Java programs for finding subtle HOMs. HOMAJ implements the developed search techniques and automates the process of creating, compiling, and executing both FOMs and HOMs. The results of our empirical studies show that all of the search techniques were able to find subtle HOMs. However, Local Search and both the Guided Local Search techniques were more effective than the other techniques in terms of their ability to find subtle HOMs. The search techniques found more subtle HOMs by combining faults created by primitive Java mutation operators than by combining faults created by Java class level operators and AspectJ operators. Composing subtle HOMs of lower degrees generated by Restricted Enumeration Search is an effective way to find new subtle HOMs of higher degrees because such HOMs are likely to exist as compositions of multiple subtle HOMs of lower degrees. However, the search-based software engineering techniques were able to find subtle HOMs of higher degrees that could not be found by combining subtle HOMs of lower degrees.Item Open Access Improving software maintainability through aspectualization(Colorado State University. Libraries, 2009) Mortensen, Michael, author; Ghosh, Sudipto, advisor; Bieman, James M., advisorThe primary claimed benefits of aspect-oriented programming (AOP) are that it improves the understandability and maintainability of software applications by modularizing cross-cutting concerns. Before there is widespread adoption of AOP, developers need further evidence of the actual benefits as well as costs. Applying AOP techniques to refactor legacy applications is one way to evaluate costs and benefits. Aspect-based refactoring, called aspectualization, involves moving program code that implements cross-cutting concerns into aspects. Such refactoring can potentially improve the maintainability of legacy systems. Long compilation and weave times, and the lack of an appropriate testing methodology are two challenges to the aspectualization of large legacy systems. We propose an iterative test driven approach for creating and introducing aspects. The approach uses mock systems that enable aspect developers to quickly experiment with different pointcuts and advice, and reduce the compile and weave times. The approach also uses weave analysis, regression testing, and code coverage analysis to test the aspects. We developed several tools for unit and integration testing. We demonstrate the test driven approach in the context of large industrial C++ systems, and we provide guidelines for mock system creation. This research examines the effects on maintainability of replacing cross-cutting concerns with aspects in three industrial applications. We study several revisions of each application, identifying cross-cutting concerns in the initial revision, and also cross-cutting concerns that are added in later revisions. Aspectualization improved maintainability by reducing code size and improving both change locality and concern diffusion. Costs include the effort required for application refactoring and aspect creation, as well as a small decrease in performance.Item Open Access Mitigating the effect of coincidental correctness in spectrum based fault localization(Colorado State University. Libraries, 2013) Bandyopadhyay, Aritra, author; Ghosh, Sudipto, advisor; Bieman, James M., committee member; France, Robert B., committee member; Strout, Michelle Mills, committee member; Turk, Daniel, committee memberCoincidentally correct test cases are those that execute faulty program statements but do not result in failures. The presence of such test cases in a test suite reduces the effectiveness of spectrum-based fault localization approaches, such as Ochiai and Tarantula, which localize faulty statements by calculating a suspiciousness score for every program statement from test coverage information. The goal of this dissertation is to improve the understanding of how the presence of coincidentally correct test cases impacts the effectiveness of spectrum-based fault localization approaches and to develop a family of approaches that improve fault localization effectiveness by mitigating the effect of coincidentally correct test cases. Each approach (1)~classifies coincidentally correct test cases using test coverage information, and (2)~recalculates a suspiciousness score for every program statement using the classification information. We developed classification approaches using test coverage metrics at different levels of granularity, such as statement, branch, and function. We developed a new approach for ranking program statements using suspiciousness scores calculated based on the heuristic that the statements covered by more failing and coincidentally correct test cases are more suspicious. We extended the family of fault localization approaches to support multiple faults. We developed an approach to incorporate tester feedback to mitigate the effect of coincidental correctness. The approach analyzes tester feedback to determine a lower bound for the number of coincidentally correct test cases present in a test suite. The lower bound is also used to determine when classification of coincidentally correct test cases can improve fault localization effectiveness. We evaluated the fault localization effectiveness of our approaches and studied how the effectiveness changes for varying characteristics of test suites, such as size, test suite type (e.g., random, coverage adequate), and the percentage of passing test cases that are coincidentally correct. Our key findings are summarized as follows. Mitigating the effect of coincidentally correct test cases improved fault localization effectiveness. The extent of the improvement increased with an increase in the percentage of passing test cases that were coincidentally correct, although no improvement was observed when most passing test cases in a test suite were coincidentally correct. When random test suites were used to localize faults, a coarse-grained coverage spectrum, such as function coverage, resulted in better classification than fine-grained coverage spectra, such as statement and branch coverage. Utilizing tester feedback improved the precision of classification. Mitigating the effect of coincidental correctness in the presence of two faults improved the effectiveness for both faults simultaneously for most faulty programs. Faulty statements that were harder to reach and that affected fewer program statements resulted in fewer coincidentally correct test cases and were more effectively localized.Item Open Access Random generation of valid object configurations for testing object-oriented programs(Colorado State University. Libraries, 2012) Sadhu, Devadatta, author; Ghosh, Sudipto, advisor; France, Robert, committee member; Turk, Daniel, committee memberA unit test case for an object-oriented program typically requires the creation of an object configuration on which the method under test is invoked. Certain approaches, such as RANDOOP, perform feedback-directed random test generation. RANDOOP incrementally generates test cases by only extending the valid ones. Invalid test cases are not explored, and thus, RANDOOP can miss the creation of some valid object configurations. In our approach, we generate a new random object configuration for each test case. This configuration may or may not satisfy the multiplicity constraints in the UML class model of the program. Instead of discarding an invalid configuration, we attempt to fix it and generate a valid test case. Since we do not reject any test case, and do not depend on the feedback from previous test cases, our object configurations are likely to obtain a higher coverage of the domain of valid configurations than RANDOOP. We implemented our approach in a prototype tool called RanTGen, which produces JUnit-style test cases. We also created an Eclipse plugin for RanTGen. Our preliminary results show that RanTGen takes less time than RANDOOP to generate the same number of test cases. RanTGen test cases kill more mutants and achieve higher coverage in terms of statements, branches, and association-end multiplicity (AEM) than RANDOOP test cases. The AEM coverage criterion defines the set of representative multiplicity tuples that must be created during a test, and is used to measure coverage of the domain of valid configurations.Item Open Access Testing with state variable data-flow criteria for aspect-oriented programs(Colorado State University. Libraries, 2011) Wedyan, Fadi, author; Ghosh, Sudipto, advisor; Bieman, James M., committee member; Malaiya, Yashwant K., committee member; Vijayasarathy, Leo, committee memberData-flow testing approaches have been used for procedural and object-oriented (OO) programs, and empirically shown to be effective in detecting faults. However, few such approaches have been proposed for aspect-oriented (AO) programs. In an AO program, data-flow interactions can occur between the base classes and aspects, which can affect the behavior of both. Faults resulting from such interactions are hard to detect unless the interactions are specifically targeted during testing. In this research, we propose a data-flow testing approach for AO programs. In an AO program, an aspect and a base class interact either through parameters passed from advised methods in the base class to the advice, or by the direct reading and writing of the base class state variables in the advice. We identify a group of def-use associations (DUAs) that are based on the base class state variables and propose a set of data-flow test criteria that require executing these DUAs. We identify fault types that result from incorrect data-flow interactions in AO programs and extend an existing AO fault model to include these faults. We implemented our approach in a tool that identifies the targeted DUAs by the proposed criteria, runs a test suite, and computes the coverage results. We conducted an empirical study that compares the cost and effectiveness of the proposed criteria with two control-flow criteria. The empirical study is performed using four subject programs. We seeded faults in the programs using three mutation tools, AjMutator, Proteum/AJ, and μJava. We used a test generation tool, called RANDOOP, to generate a pool of random test cases. To produce a test suite that satisfies a criterion, we randomly selected test cases from the test pool until required coverage for a criterion is reached. We evaluated three dimensions of the cost of a test criterion. The first dimension is the size of a test suite that satisfies a test criterion, which we measured by the number of test cases in the test suite. The second cost dimension is the density of a test case which we measured by the number of test cases in the test suite divided by the number of test requirements. The third cost dimension is the time needed to randomly obtain a test suite that satisfies a criterion, which we measured by (1) the number of iterations required by the test suites generator for randomly selecting test cases from a pool of test cases until a test criterion is satisfied, and (2) the number of the iterations per test requirement. Effectiveness is measured by the mutation scores of the test suites that satisfy a criterion. We evaluated effectiveness for all faults and for each fault type. Our results show that the test suites that cover all the DUAs of state variables are more effective in revealing faults than the control-flow criteria. However, they cost more in terms of test suite size and effort. The results also show that the test suites that cover state variable DUAs in advised classes are suitable for detecting most of the fault types in the revised AO fault model. Finally, we evaluated the cost-effectiveness of the test suites that cover all state variables DUAs for three coverage levels: 100%, 90%, and 80%. The results show that the test suites that cover 90% of the state variables DUAs are the most cost-effective.Item Open Access Towards model-based regression test selection(Colorado State University. Libraries, 2019) Al-Refai, Mohammed, author; Ghosh, Sudipto, advisor; Cazzola, Walter, advisor; Bieman, James M., committee member; Ray, Indrakshi, committee member; Vijayasarathy, Leo, committee memberModern software development processes often use UML models to plan and manage the evolution of software systems. Regression testing is important to ensure that the evolution or adaptation did not break existing functionality. Regression testing can be expensive and is performed with limited resources and under time constraints. Regression test selection (RTS) approaches are used to reduce the cost. RTS is performed by analyzing the changes made to a system at the code or model level. Existing model-based RTS approaches that use UML models have some limitations. They do not take into account the impact of changes to the inheritance hierarchy of the classes on test case selection. They use behavioral models to perform impact analysis and obtain traceability links between model elements and test cases. However, in practice, structural models such as class diagrams are most commonly used for designing and maintaining applications. Behavioral models are rarely used and even when they are used, they tend to be incomplete and lack fine-grained details needed to obtain the traceability links, which limits the applicability of the existing UML-based RTS approaches. The goal of this dissertation is to address these limitations and improve the applicability of model-based RTS in practice. To achieve this goal, we proposed a new model-based RTS approach called FLiRTS 2. The development of FLiRTS 2 was driven by our experience accrued from two model-based RTS approaches. The first approach is called MaRTS, which we proposed to incorporate the information related to inheritance hierarchy changes for test case selection. MaRTS is based on UML class and activity diagrams that represent the fine-grained behaviors of a software system and its test cases. The second approach is called FLiRTS, which we proposed to investigate the use of fuzzy logic to enable RTS based on UML sequence and activity diagrams. The activity diagrams lack fine-grained details needed to obtain the traceability links between models and test cases. MaRTS exploits reverse engineering tools to generate complete, fine-grained diagrams from source code. FLiRTS is based on refining a provided set of abstract activity diagrams to generate fine-grained activity diagrams. We learned from our experience with MaRTS that performing static analysis on class diagrams enables the identification of test cases that are impacted by changes made to the inheritance hierarchy. Our experience with FLiRTS showed that fuzzy logic can be used to address the uncertainty introduced in the traceability links because of the use of refinements of abstract models. However, it became evident that the applicability of MaRTS and FLiRTS is limited because the process that generates complete behavioral diagrams is expensive, does not scale up to real world projects, and may not always be feasible due to the heterogeneity, complexity, and size of software applications. Therefore, we proposed FLiRTS 2, which extends FLiRTS by dropping the need for using behavioral diagrams and instead relying only on the presence of UML class diagrams. In the absence of behavioral diagrams, fuzzy logic addresses the uncertainty in determining which classes and relationships in the class diagram are actually exercised by the test cases. The generalization and realization relationships in the class diagram are used to identify test cases that are impacted by the changes made to the inheritance hierarchy. We conducted a large evaluation of FLiRTS 2 and compared its safety, precision, reduction in test suite size, and the fault detection ability of the reduced test suites with that of two code-based RTS approaches that represent the state-of-art for dynamic and static RTS. The results of our empirical studies showed that FLiRTS 2 achieved high safety and reduction in test suite size. The fault detection ability of the reduced test suites was comparable to that achieved by the full test suites. FLiRTS 2 is applicable to a wide range of systems of varying domains and sizes.