Attacks and defenses for large language models on coding tasks

Zhang, Chi, author; Wang, Zifan, author; Zhao, Ruoshi, author; Mangal, Ravi, author; Fredrikson, Matt, author; Jia, Limin, author; Pasareanu, Corina, author; ACM, publisher

doi:https://doi.org/10.1145/3691620.3695297

Attacks and defenses for large language models on coding tasks

dc.contributor.author	Zhang, Chi, author
dc.contributor.author	Wang, Zifan, author
dc.contributor.author	Zhao, Ruoshi, author
dc.contributor.author	Mangal, Ravi, author
dc.contributor.author	Fredrikson, Matt, author
dc.contributor.author	Jia, Limin, author
dc.contributor.author	Pasareanu, Corina, author
dc.contributor.author	ACM, publisher
dc.date.accessioned	2024-12-17T19:12:10Z
dc.date.available	2024-12-17T19:12:10Z
dc.date.issued	2024-10-27
dc.description.abstract	Modern large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities for coding tasks, including writing and reasoning about code. They improve upon previous neural network models of code, such as code2seq or seq2seq, that already demonstrated competitive results when performing tasks such as code summarization and identifying code vulnerabilities. However, these previous code models were shown vulnerable to adversarial examples, i.e., small syntactic perturbations designed to "fool" the models. In this paper, we first aim to study the transferability of adversarial examples, generated through white-box attacks on smaller code models, to LLMs. We also propose a new attack using an LLM to generate the perturbations. Further, we propose novel cost-effective techniques to defend LLMs against such adversaries via prompting, without incurring the cost of retraining. These prompt-based defenses involve modifying the prompt to include additional information, such as examples of adversarially perturbed code and explicit instructions for reversing adversarial perturbations. Our preliminary experiments show the effectiveness of the attacks and the proposed defenses on popular LLMs such as GPT-3.5 and GPT-4.
dc.format.medium	born digital
dc.format.medium	articles
dc.identifier.bibliographicCitation	Chi Zhang, ZifanWang, Ruoshi Zhao, Ravi Mangal, Matt Fredrikson, Limin Jia, and Corina S. Păsăreanu. 2024. Attacks and Defenses for Large Language Models on Coding Tasks. In 39th IEEE/ACM International Conference on Automated Software Engineering (ASE '24), October 27- November 1, 2024, Sacramento, CA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3691620.3695297
dc.identifier.doi	https://doi.org/10.1145/3691620.3695297
dc.identifier.uri	https://hdl.handle.net/10217/239729
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	Publications
dc.relation.ispartof	ACM DL Digital Library
dc.rights	©Chi Zhang, et al. ACM 2024. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ASE '24, https://dx.doi.org/10.1145/3691620.3695297.
dc.subject	LLMs
dc.subject	code models
dc.subject	adversarial attacks
dc.subject	robustness
dc.title	Attacks and defenses for large language models on coding tasks
dc.type	Text

Files

Original bundle

Now showing 1 - 1 of 1

Name:: FACF_ACMOA_3691620.3695297.pdf
Size:: 561.64 KB
Format:: Adobe Portable Document Format

Download

Collections

Publications