Repository logo
 

Attacks and defenses for large language models on coding tasks

dc.contributor.authorZhang, Chi, author
dc.contributor.authorWang, Zifan, author
dc.contributor.authorZhao, Ruoshi, author
dc.contributor.authorMangal, Ravi, author
dc.contributor.authorFredrikson, Matt, author
dc.contributor.authorJia, Limin, author
dc.contributor.authorPasareanu, Corina, author
dc.contributor.authorACM, publisher
dc.date.accessioned2024-12-17T19:12:10Z
dc.date.available2024-12-17T19:12:10Z
dc.date.issued2024-10-27
dc.description.abstractModern large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities for coding tasks, including writing and reasoning about code. They improve upon previous neural network models of code, such as code2seq or seq2seq, that already demonstrated competitive results when performing tasks such as code summarization and identifying code vulnerabilities. However, these previous code models were shown vulnerable to adversarial examples, i.e., small syntactic perturbations designed to "fool" the models. In this paper, we first aim to study the transferability of adversarial examples, generated through white-box attacks on smaller code models, to LLMs. We also propose a new attack using an LLM to generate the perturbations. Further, we propose novel cost-effective techniques to defend LLMs against such adversaries via prompting, without incurring the cost of retraining. These prompt-based defenses involve modifying the prompt to include additional information, such as examples of adversarially perturbed code and explicit instructions for reversing adversarial perturbations. Our preliminary experiments show the effectiveness of the attacks and the proposed defenses on popular LLMs such as GPT-3.5 and GPT-4.
dc.format.mediumborn digital
dc.format.mediumarticles
dc.identifier.bibliographicCitationChi Zhang, ZifanWang, Ruoshi Zhao, Ravi Mangal, Matt Fredrikson, Limin Jia, and Corina S. Păsăreanu. 2024. Attacks and Defenses for Large Language Models on Coding Tasks. In 39th IEEE/ACM International Conference on Automated Software Engineering (ASE '24), October 27- November 1, 2024, Sacramento, CA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3691620.3695297
dc.identifier.doihttps://doi.org/10.1145/3691620.3695297
dc.identifier.urihttps://hdl.handle.net/10217/239729
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartofPublications
dc.relation.ispartofACM DL Digital Library
dc.rights©Chi Zhang, et al. ACM 2024. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ASE '24, https://dx.doi.org/10.1145/3691620.3695297.
dc.subjectLLMs
dc.subjectcode models
dc.subjectadversarial attacks
dc.subjectrobustness
dc.titleAttacks and defenses for large language models on coding tasks
dc.typeText

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
FACF_ACMOA_3691620.3695297.pdf
Size:
561.64 KB
Format:
Adobe Portable Document Format

Collections