DATE: | Tue, Nov 30, 2021
|
TIME: | 11:30 am |
PLACE: | On Zoom |
TITLE: |
GAN-RoBERTa: a Robust Semi-Supervised Model for Detecting Anti-Asian
COVID-19 Hate Speech on Social Media
|
PRESENTER: |
Yansong Li
University of Ottawa
|
ABSTRACT:
|
Anti-Asian speech during the COVID-19 pandemic has been a serious problem
with severe consequences. Social media witnessed this problem through a
hate speech wave targeting Asian communities. The timely detection of the
anti-Asian COVID-19-related hate speech is of utmost importance, not only
to allow the application of preventing mechanisms, but also to anticipate
and possibly prevent other similar discriminatory situations. In this
paper, we address the problem of detecting anti-Asian COVID-19-related
hate speech from social media data. Previous approaches that tackled this
problem utilized a transformer-based model, BERT/RoBERTa, trained on the
homologous annotated dataset and achieved good performance on this task.
However, this requires massive data annotations or the collection of
sufficiently related data samples with a strong correlation to the topic.
Both are difficult to meet goals without employing reliable and vast
resources. In this paper, we propose a robust semi-supervised model,
GAN-RoBERTa, that learns from a limited heterogeneous dataset which can be
further enhanced by using unlabeled data. Compared with the RoBERTa
baseline model, the experimental results show that the models macro-F1
score has substantial gains improving from 0.77 to 0.82 when using the
unlabeled data. Our proposed model provides state-of-the-art performance
results while efficiently using unlabeled data, showing promising
applicability to other complex classification tasks where large amounts of
labeled examples are difficult to obtain.
This is joint work with Xuanyu Su, Paula Branco, and Diana Inkpen