Demystifying Privacy Policies with Language Technologies: Progress and Challenges

TitleDemystifying Privacy Policies with Language Technologies: Progress and Challenges
Publication TypeConference Paper
Year of Publication2016
AuthorsWilson S, Schaub F, Dara A, Cherivirala SK, Zimmeck S, Andersen MSchaarup, Leon PGiovanni, Hovy E, Sadeh N
Conference NameFirst Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS 2016)
PublisherEuropean Language Resources Association (ELRA)
Conference LocationPortorož, Slovenia
ISBN Number978-2-9517408-9-1

Privacy policies written in natural language are the predominant method that operators of websites and online services use to communicate privacy practices to their users. However, these documents are infrequently read by Internet users, due in part to the length and complexity of the text. These factors also inhibit the efforts of regulators to assess privacy practices or to enforce standards. One proposed approach to improving the status quo is to use a combination of methods from crowdsourcing, natural language processing, and machine learning to extract details from privacy policies and present them in an understandable fashion. We sketch out this vision and describe our ongoing work to bring it to fruition. Further, we discuss challenges associated with bridging the gap between the contents of privacy policy text and website users’ abilities to understand those policies. These challenges are motivated by the rich interconnectedness of the problems as well as the broader impact of helping Internet users understand their privacy choices. They could also provide a basis for competitions that use the annotated corpus introduced in this paper.