Harnessing large language models for permission fidelity analysis from android application descriptions

Tamrakar, Yunik, author; Ray, Indrakshi, advisor; Banerjee, Ritwik, advisor; Ghosh, Sudipto, committee member; Simske, Steve, committee member

Harnessing large language models for permission fidelity analysis from android application descriptions

Files

Tamrakar_colostate_0053N_18881.pdf (4.83 MB)

Date

2025

Authors

Tamrakar, Yunik, author

Ray, Indrakshi, advisor

Banerjee, Ritwik, advisor

Ghosh, Sudipto, committee member

Simske, Steve, committee member

Abstract

Android applications are very popular these days and as of mid-2024 there are over 2 million applications in the Google Play Store. With such a large number of applications available for download, the threat of privacy leakage increases considerably, primarily due to the users' limited knowledge in distinguishing the necessary app permissions. This makes accurate and consistent checking of the permissions collected by the applications necessary to ensure the protection of the user's privacy. Studies have indicated that inferring permissions from app descriptions is an effective way to determine whether the collected permissions are necessary or not. Previous research in the permission inference space has explored techniques such as keyword-based matching, Natural Language Processing methods (including part-of-speech tagging and named entity recognition), as well as deep learning based approaches using Recurrent Neural Networks. However, app descriptions are often vague and may omit details to meet sentence length restrictions, resulting in suboptimal performance of these models. This limitation motivated our choice of large language models (LLMs), as their advanced contextual understanding and ability to infer implicit information can directly address the weaknesses observed in previous approaches. In this work, we explore various LLM architectures for the permission inference task and provide a detailed comparison across various models. We evaluate both zero-shot learning and fine-tuning based approaches, demonstrating that fine-tuned models can achieve state-of-the-art performance. Additionally, by employing targeted generative AI based training data augmentation techniques, we show that these fine-tuned models can significantly outperform baseline methods. Furthermore, we illustrate the potential of leveraging paraphrasing to boost fine-tuned performance by over 50 percent, all while using only a very small number of annotated samples—a rarity for LLMs.

Rights Access

Embargo expires: 05/28/2026.

Subject

android permissions

LLM

privacy

compliance

android applications

NLP

URI

https://hdl.handle.net/10217/240955

Collections

2020-
Theses and Dissertations

Full item page

Harnessing large language models for permission fidelity analysis from android application descriptions

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Associated Publications

Collections