A strong baseline for question relevancy ranking

Ana Valeria Gonzalez; Isabelle Augenstein; Anders Søgaard

A strong baseline for question relevancy ranking

Ana Valeria Gonzalez, Isabelle Augenstein, Anders Søgaard

Abstract

The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks - a task that amounts to question relevancy ranking - involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google's search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.

Originalsprog	Dansk
Titel	Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Forlag	Association for Computational Linguistics
Publikationsdato	2018
Sider	4810–4815
Status	Udgivet - 2018
Begivenhed	2018 Conference on Empirical Methods in Natural Language Processing - Brussels, Belgien Varighed: 31 okt. 2018 → 4 nov. 2018

Konference

Konference	2018 Conference on Empirical Methods in Natural Language Processing
Land/Område	Belgien
By	Brussels
Periode	31/10/2018 → 04/11/2018

Citationsformater

A strong baseline for question relevancy ranking. / Gonzalez, Ana Valeria ; Augenstein, Isabelle ; Søgaard, Anders.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018. s. 4810–4815 .

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › peer review

@inproceedings{2031473e11d14ec2b8904b8944f09b99,

title = "A strong baseline for question relevancy ranking",

abstract = "The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks - a task that amounts to question relevancy ranking - involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google's search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.",

author = "Gonzalez, {Ana Valeria} and Isabelle Augenstein and Anders S{\o}gaard",

year = "2018",

language = "Dansk",

pages = "4810–4815 ",

booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",

publisher = "Association for Computational Linguistics",

note = "2018 Conference on Empirical Methods in Natural Language Processing ; Conference date: 31-10-2018 Through 04-11-2018",

}

TY - GEN

T1 - A strong baseline for question relevancy ranking

AU - Gonzalez, Ana Valeria

AU - Augenstein, Isabelle

AU - Søgaard, Anders

PY - 2018

Y1 - 2018

N2 - The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks - a task that amounts to question relevancy ranking - involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google's search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.

AB - The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks - a task that amounts to question relevancy ranking - involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google's search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.

M3 - Konferencebidrag i proceedings

SP - 4810

EP - 4815

BT - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

PB - Association for Computational Linguistics

T2 - 2018 Conference on Empirical Methods in Natural Language Processing

Y2 - 31 October 2018 through 4 November 2018

ER -