Repository logo

Automated extraction of access control policy from natural language documents


Data security and privacy are fundamental requirements in information systems. The first step to providing data security and privacy for organizations is defining access control policies (ACPs). Security requirements are often expressed in natural languages, and ACPs are embedded in the security requirements. However, ACPs in natural language are unstructured and ambiguous, so manually extracting ACPs from security requirements and translating them into enforceable policies is tedious, complex, expensive, labor-intensive, and error-prone. Thus, the automated ACPs specification process is crucial. In this thesis, we consider the Next Generation Access Control (NGAC) model as our reference formal access control model to study the automation process. This thesis addresses the research question: How do we automatically translate access control policies (ACPs) from natural language expression to the NGAC formal specification? Answering this research question entails building an automated extraction framework. The pro- posed framework aims to translate natural language ACPs into NGAC specifications automatically. The primary contributions of this research are developing models to construct ACPs in NGAC specification from natural language automatically and generating a realistic synthetic dataset of access control policies sentences to evaluate the proposed framework. Our experimental results are promising as we achieved, on average, an F1-score of 93 % when identifying ACPs sentences, an F1-score of 96 % when extracting NGAC relations between attributes, and an F1-score of 96% when extracting user attribute and 89% for object attribute from natural language access control policies.


Rights Access

Embargo expires: 12/29/2025.



Associated Publications