Abstract
We predict which claim in a political debate should be prioritized
for fact-checking. A particular challenge is, given a debate, how to
produce a ranked list of its sentences based on their worthiness for fact
checking. We develop a Recurrent Neural Network (RNN) model that
learns a sentence embedding, which is then used to predict the checkworthiness
of a sentence. Our sentence embedding encodes both semantic
and syntactic dependencies using pretrained word2vec word embeddings
as well as part-of-speech tagging and syntactic dependency parsing. This
results in a multi-representation of each word, which we use as input to a
RNN with GRU memory units; the output from each word is aggregated
using attention, followed by a fully connected layer, from which the output
is predicted using a sigmoid function. The overall performance of our
techniques is successful, achieving the overall second best performing run
(MAP: 0.1152) in the competition, as well as the highest overall performance
(MAP: 0.1810) for our contrastive run with a 32% improvement
over the second highest MAP score in the English language category. In
our primary run we combined our sentence embedding with state of the
art check-worthy features, whereas in the contrastive run we considered
our sentence embedding alone
for fact-checking. A particular challenge is, given a debate, how to
produce a ranked list of its sentences based on their worthiness for fact
checking. We develop a Recurrent Neural Network (RNN) model that
learns a sentence embedding, which is then used to predict the checkworthiness
of a sentence. Our sentence embedding encodes both semantic
and syntactic dependencies using pretrained word2vec word embeddings
as well as part-of-speech tagging and syntactic dependency parsing. This
results in a multi-representation of each word, which we use as input to a
RNN with GRU memory units; the output from each word is aggregated
using attention, followed by a fully connected layer, from which the output
is predicted using a sigmoid function. The overall performance of our
techniques is successful, achieving the overall second best performing run
(MAP: 0.1152) in the competition, as well as the highest overall performance
(MAP: 0.1810) for our contrastive run with a 32% improvement
over the second highest MAP score in the English language category. In
our primary run we combined our sentence embedding with state of the
art check-worthy features, whereas in the contrastive run we considered
our sentence embedding alone
Originalsprog | Engelsk |
---|---|
Titel | CLEF 2018 Working Notes |
Redaktører | Linda Cappellato , Nicola Ferro , Jian-Yun Nie, Laure Soulier |
Antal sider | 8 |
Forlag | CEUR-WS.org |
Publikationsdato | 2018 |
Udgave | 10 |
Artikelnummer | 81 |
Status | Udgivet - 2018 |
Begivenhed | 19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 - Avignon, Frankrig Varighed: 10 sep. 2018 → 14 sep. 2018 |
Konference
Konference | 19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 |
---|---|
Land/Område | Frankrig |
By | Avignon |
Periode | 10/09/2018 → 14/09/2018 |
Navn | CEUR Workshop Proceedings |
---|---|
Vol/bind | 2125 |
ISSN | 1613-0073 |