A strong baseline for question relevancy ranking

    Abstract

    The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks - a task that amounts to question relevancy ranking - involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google's search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.

    OriginalsprogDansk
    TitelProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
    ForlagAssociation for Computational Linguistics
    Publikationsdato2018
    Sider4810–4815
    StatusUdgivet - 2018
    Begivenhed2018 Conference on Empirical Methods in Natural Language Processing - Brussels, Belgien
    Varighed: 31 okt. 20184 nov. 2018

    Konference

    Konference2018 Conference on Empirical Methods in Natural Language Processing
    Land/OmrådeBelgien
    ByBrussels
    Periode31/10/201804/11/2018

    Citationsformater