Learning what to share between loosely related tasks

Sebastian Ruder; Joachim Bingel; Isabelle Augenstein; Anders Søgaard

Learning what to share between loosely related tasks

Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard

17 Downloads (Pure)

Abstract

Multi-task learning is motivated by the observation that humans bring to bear what they know about related problems when solving new ones. Similarly, deep neural networks can profit from related tasks by sharing parameters with other networks. However, humans do not consciously decide to transfer knowledge between tasks. In Natural Language Processing (NLP), it is hard to predict if sharing will lead to improvements, particularly if tasks are only loosely related. To overcome this, we introduce Sluice Networks, a general framework for multi-task learning where trainable parameters control the amount of sharing. Our framework generalizes previous proposals in enabling sharing of all combinations of subspaces, layers, and skip connections. We perform experiments on three task pairs, and across seven different domains, using data from OntoNotes 5.0, and achieve up to 15% average error reductions over common approaches to multi-task learning. We show that a) label entropy is predictive of gains in sluice networks, confirming findings for hard parameter sharing and b) while sluice networks easily fit noise, they are robust across domains in practice.

Originalsprog	Engelsk
Tidsskrift	arXiv
Status	Udgivet - 23 maj 2017

Emneord

stat.ML
cs.AI
cs.CL
cs.LG
cs.NE

Adgang til dokumentet

Learning what to share between loosely related tasksIndsendt manuskript, 610 KB

Citationsformater

@article{0df041fc199d497f86d5be12f1ebf6fa,

title = "Learning what to share between loosely related tasks",

abstract = " Multi-task learning is motivated by the observation that humans bring to bear what they know about related problems when solving new ones. Similarly, deep neural networks can profit from related tasks by sharing parameters with other networks. However, humans do not consciously decide to transfer knowledge between tasks. In Natural Language Processing (NLP), it is hard to predict if sharing will lead to improvements, particularly if tasks are only loosely related. To overcome this, we introduce Sluice Networks, a general framework for multi-task learning where trainable parameters control the amount of sharing. Our framework generalizes previous proposals in enabling sharing of all combinations of subspaces, layers, and skip connections. We perform experiments on three task pairs, and across seven different domains, using data from OntoNotes 5.0, and achieve up to 15% average error reductions over common approaches to multi-task learning. We show that a) label entropy is predictive of gains in sluice networks, confirming findings for hard parameter sharing and b) while sluice networks easily fit noise, they are robust across domains in practice. ",

keywords = "stat.ML, cs.AI, cs.CL, cs.LG, cs.NE",

author = "Sebastian Ruder and Joachim Bingel and Isabelle Augenstein and Anders S{\o}gaard",

note = "12 pages, 3 figures, 6 tables",

year = "2017",

month = may,

day = "23",

language = "English",

journal = "arXiv",

}

TY - JOUR

T1 - Learning what to share between loosely related tasks

AU - Ruder, Sebastian

AU - Bingel, Joachim

AU - Augenstein, Isabelle

AU - Søgaard, Anders

N1 - 12 pages, 3 figures, 6 tables

PY - 2017/5/23

Y1 - 2017/5/23

N2 - Multi-task learning is motivated by the observation that humans bring to bear what they know about related problems when solving new ones. Similarly, deep neural networks can profit from related tasks by sharing parameters with other networks. However, humans do not consciously decide to transfer knowledge between tasks. In Natural Language Processing (NLP), it is hard to predict if sharing will lead to improvements, particularly if tasks are only loosely related. To overcome this, we introduce Sluice Networks, a general framework for multi-task learning where trainable parameters control the amount of sharing. Our framework generalizes previous proposals in enabling sharing of all combinations of subspaces, layers, and skip connections. We perform experiments on three task pairs, and across seven different domains, using data from OntoNotes 5.0, and achieve up to 15% average error reductions over common approaches to multi-task learning. We show that a) label entropy is predictive of gains in sluice networks, confirming findings for hard parameter sharing and b) while sluice networks easily fit noise, they are robust across domains in practice.

AB - Multi-task learning is motivated by the observation that humans bring to bear what they know about related problems when solving new ones. Similarly, deep neural networks can profit from related tasks by sharing parameters with other networks. However, humans do not consciously decide to transfer knowledge between tasks. In Natural Language Processing (NLP), it is hard to predict if sharing will lead to improvements, particularly if tasks are only loosely related. To overcome this, we introduce Sluice Networks, a general framework for multi-task learning where trainable parameters control the amount of sharing. Our framework generalizes previous proposals in enabling sharing of all combinations of subspaces, layers, and skip connections. We perform experiments on three task pairs, and across seven different domains, using data from OntoNotes 5.0, and achieve up to 15% average error reductions over common approaches to multi-task learning. We show that a) label entropy is predictive of gains in sluice networks, confirming findings for hard parameter sharing and b) while sluice networks easily fit noise, they are robust across domains in practice.

KW - stat.ML

KW - cs.AI

KW - cs.CL

KW - cs.LG

KW - cs.NE

M3 - Journal article

JO - arXiv

JF - arXiv

ER -

Learning what to share between loosely related tasks

Abstract

Emneord

Adgang til dokumentet

Fingeraftryk

Citationsformater