Predicting Depression Levels and
Suicide Ideation within the Canadian
Population from Social Media
PRESENTER:
Ruba Skaik
University of Ottawa
ABSTRACT:
The economic burden of mental illness costs Canada billions of dollars
every year.
Millions of people suffer from mental illness and only a fraction receives
adequate treatment.
Identifying people with mental illness requires initiation from the person
in need, available
medical services, and sufficient time from professional expertise. These
resources might not
be available all the time. Thus, analysing Social media posts can play an
important role
in sensing mental health disorders throughout the Canadian population. Big
data research
of social media may also endorse standard surveillance approaches and
provide decisionmakers with usable information. More precisely, social
media analysis has shown promising
results for public health assessment and monitoring. In this research, we
explore the task
of automatically analysing social media textual data using Natural
Language Processing
(NLP) and Machine Learning (ML) techniques to detect signs of mental
health disorders
that need attention, such as depression and suicide ideation. Taking into
account the lack of
comprehensive annotated data in this field, we propose a methodology for
transfer learning
to utilize the information hidden in a training sample and leveraging it
on a different
dataset to choose the best generalized model to be applied at the
population level. We
also present evidence that ML models designed to predict depression and
suicide ideation
using reddit data can utilize the knowledge they encoded to make
predictions on Twitter
data, even though the two platforms differ in the purpose, structure, and
limitations. As
labeled data is not available at population level, we compute the
correlation between the
results of the ML models and data from Statistics Canada. In our proposed
models we use
feature engineering with conventional machine learning algorithms such as
SVM, LR, RF,
XGBoost and GBDT and we compare their results with deep learning
algorithms, such
as LSTM, Bi-LSTM and Convolution Neural Network (CNN). We adopt the CNN
model
because it achieved the highest accuracy of 91%; the model will be used to
estimate the
depression level of the population. For suicide ideation detection, we
used the XGBoost
classifier - that was used to predict suicide ideation at user-level and
achieved an F1-score
of 0.922 - for estimating suicide ideation in the Canadian population. We
compared the
results with the reported Suicide thoughts during 2015 by Statistics
Canada and achieved
a 0.61 correlation between the predicted and actual population at the
provinces level.