Abstract
We predict which claim in a political debate should be prioritized
for fact-checking. A particular challenge is, given a debate, how to
produce a ranked list of its sentences based on their worthiness for fact
checking. We develop a Recurrent Neural Network (RNN) model that
learns a sentence embedding, which is then used to predict the checkworthiness
of a sentence. Our sentence embedding encodes both semantic
and syntactic dependencies using pretrained word2vec word embeddings
as well as part-of-speech tagging and syntactic dependency parsing. This
results in a multi-representation of each word, which we use as input to a
RNN with GRU memory units; the output from each word is aggregated
using attention, followed by a fully connected layer, from which the output
is predicted using a sigmoid function. The overall performance of our
techniques is successful, achieving the overall second best performing run
(MAP: 0.1152) in the competition, as well as the highest overall performance
(MAP: 0.1810) for our contrastive run with a 32% improvement
over the second highest MAP score in the English language category. In
our primary run we combined our sentence embedding with state of the
art check-worthy features, whereas in the contrastive run we considered
our sentence embedding alone
for fact-checking. A particular challenge is, given a debate, how to
produce a ranked list of its sentences based on their worthiness for fact
checking. We develop a Recurrent Neural Network (RNN) model that
learns a sentence embedding, which is then used to predict the checkworthiness
of a sentence. Our sentence embedding encodes both semantic
and syntactic dependencies using pretrained word2vec word embeddings
as well as part-of-speech tagging and syntactic dependency parsing. This
results in a multi-representation of each word, which we use as input to a
RNN with GRU memory units; the output from each word is aggregated
using attention, followed by a fully connected layer, from which the output
is predicted using a sigmoid function. The overall performance of our
techniques is successful, achieving the overall second best performing run
(MAP: 0.1152) in the competition, as well as the highest overall performance
(MAP: 0.1810) for our contrastive run with a 32% improvement
over the second highest MAP score in the English language category. In
our primary run we combined our sentence embedding with state of the
art check-worthy features, whereas in the contrastive run we considered
our sentence embedding alone
Original language | English |
---|---|
Title of host publication | CLEF 2018 Working Notes |
Editors | Linda Cappellato , Nicola Ferro , Jian-Yun Nie, Laure Soulier |
Number of pages | 8 |
Publisher | CEUR-WS.org |
Publication date | 2018 |
Edition | 10 |
Article number | 81 |
Publication status | Published - 2018 |
Event | 19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 - Avignon, France Duration: 10 Sept 2018 → 14 Sept 2018 |
Conference
Conference | 19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 |
---|---|
Country/Territory | France |
City | Avignon |
Period | 10/09/2018 → 14/09/2018 |
Series | CEUR Workshop Proceedings |
---|---|
Volume | 2125 |
ISSN | 1613-0073 |
Keywords
- CNN
- Fact checking
- Political debates
- RNN