A strong baseline for question relevancy ranking

Ana Valeria Gonzalez; Isabelle Augenstein; Anders Søgaard

A strong baseline for question relevancy ranking

Ana Valeria Gonzalez, Isabelle Augenstein, Anders Søgaard

Abstract

The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks - a task that amounts to question relevancy ranking - involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google's search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.

Original language	Danish
Title of host publication	Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Publisher	Association for Computational Linguistics
Publication date	2018
Pages	4810–4815
Publication status	Published - 2018
Event	2018 Conference on Empirical Methods in Natural Language Processing - Brussels, Belgium Duration: 31 Oct 2018 → 4 Nov 2018

Conference

Conference	2018 Conference on Empirical Methods in Natural Language Processing
Country/Territory	Belgium
City	Brussels
Period	31/10/2018 → 04/11/2018

Cite this

A strong baseline for question relevancy ranking. / Gonzalez, Ana Valeria ; Augenstein, Isabelle ; Søgaard, Anders.

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018. p. 4810–4815 .

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

@inproceedings{2031473e11d14ec2b8904b8944f09b99,

title = "A strong baseline for question relevancy ranking",

abstract = "The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks - a task that amounts to question relevancy ranking - involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google's search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.",

author = "Gonzalez, {Ana Valeria} and Isabelle Augenstein and Anders S{\o}gaard",

year = "2018",

language = "Dansk",

pages = "4810–4815 ",

booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",

publisher = "Association for Computational Linguistics",

note = "2018 Conference on Empirical Methods in Natural Language Processing ; Conference date: 31-10-2018 Through 04-11-2018",

}

TY - GEN

T1 - A strong baseline for question relevancy ranking

AU - Gonzalez, Ana Valeria

AU - Augenstein, Isabelle

AU - Søgaard, Anders

PY - 2018

Y1 - 2018

N2 - The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks - a task that amounts to question relevancy ranking - involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google's search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.

AB - The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks - a task that amounts to question relevancy ranking - involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google's search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.

M3 - Konferencebidrag i proceedings

SP - 4810

EP - 4815

BT - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

PB - Association for Computational Linguistics

T2 - 2018 Conference on Empirical Methods in Natural Language Processing

Y2 - 31 October 2018 through 4 November 2018

ER -