Think Before Your Click: Data and Models for Adult Content in Arabic Twitter

TitleThink Before Your Click: Data and Models for Adult Content in Arabic Twitter
Publication TypeConference Paper
Year of Publication2017
AuthorsAlshehri A, Nagoudi EMoatez Bil, Alhuzali H, Abdul-Mageed M
Conference NameSecond Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS 2018)
PublisherEuropean Language Resources Association (ELRA)
Conference LocationMiyazaki, Japan
ISBN Number979-10-95546-00-9

Given the widespread use of social media and their growing role in our lives today, there is a pressing need for ensuring the safety of these online spaces. In particular, the spread of adult content in social networks is undesirable by various social groups and may even pose a threat to others (e.g., children). In this work, we develop a unique, large-scale dataset of adult content in Arabic Twitter and provide in-depth analyses of the data. The dataset enables us to study the scope and distribution of adult content in the Arabic Twitter sphere, thus possibly uncovering target geographic locales. We also exploit the data to learn a large lexicon specific to topic of adult content. We further utilize the data to to detect spreaders of adult content on the microblogging platform. Our models achieve promising results, reaching 0:79% accuracy on the task (24% higher than a competitive baseline, p < 0:3).