Machine learning for computer aided programming: from stochastic program repair to verifiable program equivalence

Kommrusch, Steve, author; Pouchet, Louis-Noël, advisor; Anderson, Charles, advisor; Beveridge, Ross, committee member; Azimi-Sadjadi, Mahmood, committee member

Machine learning for computer aided programming: from stochastic program repair to verifiable program equivalence

Files

Kommrusch_colostate_0053A_17065.pdf (4.76 MB)

Date

2022

Authors

Kommrusch, Steve, author

Pouchet, Louis-Noël, advisor

Anderson, Charles, advisor

Beveridge, Ross, committee member

Azimi-Sadjadi, Mahmood, committee member

Abstract

Computer programming has benefited from a virtuous cycle of innovation as improvements in computer hardware and software make higher levels of program abstraction and complexity possible. Recent advances in the field of machine learning, including neural network models for translating and answering questions about human language, can also be applied to computer programming itself. This thesis aims to make progress on the problem of using machine learning to improve the quality and robustness of computer programs by contributing new techniques for representation of programming problems, applying neural network models to code, and training procedures to create systems useful for computer aided programming. We first present background and preliminary studies of machine learning concepts. We then present a system that directly produces source code for automatic program repair which advances the state of the art by using a learned copy mechanism during generation. We extend a similar system to tune its learning for security vulnerability repair. We then develop a system for program equivalence which generates deterministically checkable output for equivalent programs. For this work we detail our contribution to the popular OpenNMT-py GitHub project used broadly for neural machine translation. Finally, we show how the deterministically checkable output can provide self-supervised sample selection which improves the performance and generalizability of the system. We develop breadth metrics to demonstrate that the range of problems addressed is representative of the problem space, while demonstrating that our deep neural networks generate proposed solutions which can be verified in linear time. Ultimately, our work provides promising results in multiple areas of computer aided programming which allow human developers to produce quality software more effectively.

Subject

machine learning

program repair

program equivalence

computer aided programming

URI

https://hdl.handle.net/10217/235289

Collections

2020-
Theses and Dissertations

Full item page

Machine learning for computer aided programming: from stochastic program repair to verifiable program equivalence

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Associated Publications

Collections