DATE: Thu, Oct 19, 2017
TIME: 1 pm
TITLE: Multimodal Sentiment Analysis Using Deep Multimodal Learning Structure
PRESENTER: Habibeh Naderi Khorshidi
Dalhousie University

Human brain recognizes the sentiment of an expressed opinion by integrating multiple sources of information. Our sentiment perception is not only obtained by analyzing verbal information but also acquired by investigating the audio and visual cues of how that utterance has been expressed. A single source of information (e.g., text-based sentiment analysis) may not be enough to detect and handle ambiguity. However, the textual, audio and visual characteristics of a statement are strongly related and their combination can resolve ambiguity to some extent. In this research, we want to understand the interaction patterns between the spoken words and visual gestures. Hence, we propose a multimodal deep learning structure that automatically extracts salient features from textual, acoustic and visual data for sentiment analysis. We use a convolutional neural network (CNN) plus an LSTM recurrent neural network (RNN) structure to extract visual features and two independent LSTM RNNs to extract textual and acoustic features. Then, we try to find an optimal configuration to combine all features into a joint representation that builds our multimodal layer. Finally, above the multimodal layer, we consider a decision Softmax layer to obtain the predictions, i.e., the probability of the input example being positive or negative.