Repository logo
 

Towards automated security and privacy policies specification and analysis

Abstract

Security and privacy policies, vital for information systems, are typically expressed in natural language documents. Security policy is represented by Access Control Policies (ACPs) within security requirements, initially drafted in natural language and subsequently translated into enforce- able policy. The unstructured and ambiguous nature of the natural language documents makes the manual translation process tedious, expensive, labor-intensive, and prone to errors. On the other hand, Privacy policy, with its length and complexity, presents unique challenges. The dense language and extensive content of the privacy policies can be overwhelming, hindering both novice users and experts from fully understanding the practices related to data collection and sharing. The disclosure of these data practices to users, as mandated by privacy regulations such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), is of utmost importance. To address these challenges, we have turned to Natural Language Processing (NLP) to automate extracting critical information from natural language documents and analyze those security and privacy policies. Thus, this dissertation aims to address two primary research questions: Question 1: How can we automate the translation of Access Control Policies (ACPs) from natural language expressions to the formal model of Next Generation Access Control (NGAC) and subsequently analyze the generated model? Question 2: How can we automate the extraction and analysis of data practices from privacy policies to ensure alignment with privacy regulations (GDPR and CCPA)? Addressing these research questions necessitates the development of a comprehensive framework comprising two key components. The first component, SR2ACM, focuses on translating natural language ACPs into the NGAC model. This component introduces a series of innovative contributions to the analysis of security policies. At the core of our contributions is an automated approach to constructing ACPs within the NGAC specification directly from natural language documents. Our approach integrates machine learning with software testing, a novel methodology to ensure the quality of the extracted access control model. The second component, Privacy2Practice, is designed to automate the extraction and analysis of the data practices from privacy policies written in natural language. We have developed an automated method to extract data practices mandated by privacy regulations and to analyze the disclosure of these data practices within the privacy policies. The novelty of this research lies in creating a comprehensive framework that identifies the critical elements within security and privacy policies. Thus, this innovative framework enables automated extraction and analysis of both types of policies directly from natural language documents.

Description

Rights Access

Embargo expires: 08/16/2026.

Subject

Citation

Associated Publications