DATE: | Wed, Feb 5, 2014 |
TIME: | 2:30 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | Textual Risk Mining for Maritime Situational Awareness |
PRESENTER: | Amir H. Razavi University of Ottawa |
ABSTRACT:
We propose an auxiliary Machine Learning (ML) and Natural Language
Processing (NLP) integrated system for maritime situational awareness
(MSA) operations. We bring into account a new and influential asset -
human intuition and perception - to the existing semi-automated decision
support systems that mostly rely on numerical data collected by electronic
sensors or cameras located either directly on the vessels or in the
maritime command-and-control centers.
We gathered weekly textual reports spanning twelve months
from the United States Worldwide Threats to Shipping Reports repository
that belongs to the National Geospatial-Intelligence Agency (NGA). We
considered the maritime incident reports written by human operators as a
valuable and accessible unstructured textual input source in which a span
of text is called "risk" if it expresses one of the following kinds of
vessel incidents: fired, robbed, boarded, hijacked, attacked, chased,
approached, kidnapped, boarding attempted, suspiciously approached or
clashed with.
Our approach benefits from probability distributions of some useful
features annotated based on a list of lexicons that contain expressions
denoting vessel types, risks types, risk associates, maritime geographical
locations, dates and times. These distributions are captured and used to
anchor the span of "risks" as they are described in the textual reports.
After some pre-processing steps that include tokenization, named entity
extraction and part-of-speech tagging, the textual risk mining system
applies a variety of sequence classification algorithms, e.g., Conditional
Random Fields, Conditional Markov Models and Hidden Markov Models in order
to compare the risk classification performance. Empirical results show
that our NLP/ML-based system can extract variable-length risk spans from
the textual reports with about 90% correctness.
|