DATE: Wed, Jul 22, 2020
TIME: 1 pm
PLACE: Online on Zoom
TITLE: Predicting Depression Levels and Suicide Ideation within the Canadian Population from Social Media
PRESENTER: Ruba Skaik
University of Ottawa
ABSTRACT:

The economic burden of mental illness costs Canada billions of dollars every year. Millions of people suffer from mental illness and only a fraction receives adequate treatment. Identifying people with mental illness requires initiation from the person in need, available medical services, and sufficient time from professional expertise. These resources might not be available all the time. Thus, analysing Social media posts can play an important role in sensing mental health disorders throughout the Canadian population. Big data research of social media may also endorse standard surveillance approaches and provide decisionmakers with usable information. More precisely, social media analysis has shown promising results for public health assessment and monitoring. In this research, we explore the task of automatically analysing social media textual data using Natural Language Processing (NLP) and Machine Learning (ML) techniques to detect signs of mental health disorders that need attention, such as depression and suicide ideation. Taking into account the lack of comprehensive annotated data in this field, we propose a methodology for transfer learning to utilize the information hidden in a training sample and leveraging it on a different dataset to choose the best generalized model to be applied at the population level. We also present evidence that ML models designed to predict depression and suicide ideation using reddit data can utilize the knowledge they encoded to make predictions on Twitter data, even though the two platforms differ in the purpose, structure, and limitations. As labeled data is not available at population level, we compute the correlation between the results of the ML models and data from Statistics Canada. In our proposed models we use feature engineering with conventional machine learning algorithms such as SVM, LR, RF, XGBoost and GBDT and we compare their results with deep learning algorithms, such as LSTM, Bi-LSTM and Convolution Neural Network (CNN). We adopt the CNN model because it achieved the highest accuracy of 91%; the model will be used to estimate the depression level of the population. For suicide ideation detection, we used the XGBoost classifier - that was used to predict suicide ideation at user-level and achieved an F1-score of 0.922 - for estimating suicide ideation in the Canadian population. We compared the results with the reported Suicide thoughts during 2015 by Statistics Canada and achieved a 0.61 correlation between the predicted and actual population at the provinces level.