Attacks and defenses for large language models on coding tasks

Zhang, Chi, author; Wang, Zifan, author; Zhao, Ruoshi, author; Mangal, Ravi, author; Fredrikson, Matt, author; Jia, Limin, author; Pasareanu, Corina, author; ACM, publisher

Attacks and defenses for large language models on coding tasks

Files

FACF_ACMOA_3691620.3695297.pdf (561.64 KB)

Date

2024-10-27

Authors

Fredrikson, Matt, author

Jia, Limin, author

Pasareanu, Corina, author

ACM, publisher

Abstract

Modern large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities for coding tasks, including writing and reasoning about code. They improve upon previous neural network models of code, such as code2seq or seq2seq, that already demonstrated competitive results when performing tasks such as code summarization and identifying code vulnerabilities. However, these previous code models were shown vulnerable to adversarial examples, i.e., small syntactic perturbations designed to "fool" the models. In this paper, we first aim to study the transferability of adversarial examples, generated through white-box attacks on smaller code models, to LLMs. We also propose a new attack using an LLM to generate the perturbations. Further, we propose novel cost-effective techniques to defend LLMs against such adversaries via prompting, without incurring the cost of retraining. These prompt-based defenses involve modifying the prompt to include additional information, such as examples of adversarially perturbed code and explicit instructions for reversing adversarial perturbations. Our preliminary experiments show the effectiveness of the attacks and the proposed defenses on popular LLMs such as GPT-3.5 and GPT-4.

Subject

LLMs

code models

adversarial attacks

robustness

URI

https://hdl.handle.net/10217/239729

Collections

Publications

Full item page

Attacks and defenses for large language models on coding tasks

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Associated Publications

Collections