DATE: Tue, Nov 30, 2021
TIME: 11:30 am
PLACE: On Zoom
TITLE: GAN-RoBERTa: a Robust Semi-Supervised Model for Detecting Anti-Asian COVID-19 Hate Speech on Social Media
PRESENTER: Yansong Li
University of Ottawa
ABSTRACT:

Anti-Asian speech during the COVID-19 pandemic has been a serious problem with severe consequences. Social media witnessed this problem through a hate speech wave targeting Asian communities. The timely detection of the anti-Asian COVID-19-related hate speech is of utmost importance, not only to allow the application of preventing mechanisms, but also to anticipate and possibly prevent other similar discriminatory situations. In this paper, we address the problem of detecting anti-Asian COVID-19-related hate speech from social media data. Previous approaches that tackled this problem utilized a transformer-based model, BERT/RoBERTa, trained on the homologous annotated dataset and achieved good performance on this task. However, this requires massive data annotations or the collection of sufficiently related data samples with a strong correlation to the topic. Both are difficult to meet goals without employing reliable and vast resources. In this paper, we propose a robust semi-supervised model, GAN-RoBERTa, that learns from a limited heterogeneous dataset which can be further enhanced by using unlabeled data. Compared with the RoBERTa baseline model, the experimental results show that the models macro-F1 score has substantial gains improving from 0.77 to 0.82 when using the unlabeled data. Our proposed model provides state-of-the-art performance results while efficiently using unlabeled data, showing promising applicability to other complex classification tasks where large amounts of labeled examples are difficult to obtain.

This is joint work with Xuanyu Su, Paula Branco, and Diana Inkpen