Active learning with support vector machines

Jan Kremer; Kim Steenstrup Pedersen; Christian Igel

doi:10.1002/widm.1132

Active learning with support vector machines

Jan Kremer, Kim Steenstrup Pedersen, Christian Igel

Department of Computer Science

59 Citations (Scopus)

Abstract

In machine learning, active learning refers to algorithms that autonomously select the data points from which they will learn. There are many data mining applications in which large amounts of unlabeled data are readily available, but labels (e.g., human annotations or results coming from complex experiments) are costly to obtain. In such scenarios, an active learning algorithm aims at identifying data points that, if labeled and used for training, would most improve the learned model. Labels are then obtained only for the most promising data points. This speeds up learning and reduces labeling costs. Support vector machine (SVM) classifiers are particularly well-suited for active learning due to their convenient mathematical properties. They perform linear classification, typically in a kernel-induced feature space, which makes expressing the distance of a data point from the decision boundary straightforward. Furthermore, heuristics can efficiently help estimate how strongly learning from a data point influences the current model. This information can be used to actively select training samples. After a brief introduction to the active learning problem, we discuss different query strategies for selecting informative data points and review how these strategies give rise to different variants of active learning with SVMs. For further resources related to this article, please visit the WIREs website. Conflict of interest: The authors have declared no conflicts of interest for this article.

Original language	English
Journal	Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Volume	4
Issue number	4
Pages (from-to)	313-326
Number of pages	14
ISSN	1942-4787
DOIs	https://doi.org/10.1002/widm.1132
Publication status	Published - 2014

Access to Document

10.1002/widm.1132

Cite this

@article{d2463dfe172c4930b064828d9457ac01,

title = "Active learning with support vector machines",

abstract = "In machine learning, active learning refers to algorithms that autonomously select the data points from which they will learn. There are many data mining applications in which large amounts of unlabeled data are readily available, but labels (e.g., human annotations or results coming from complex experiments) are costly to obtain. In such scenarios, an active learning algorithm aims at identifying data points that, if labeled and used for training, would most improve the learned model. Labels are then obtained only for the most promising data points. This speeds up learning and reduces labeling costs. Support vector machine (SVM) classifiers are particularly well-suited for active learning due to their convenient mathematical properties. They perform linear classification, typically in a kernel-induced feature space, which makes expressing the distance of a data point from the decision boundary straightforward. Furthermore, heuristics can efficiently help estimate how strongly learning from a data point influences the current model. This information can be used to actively select training samples. After a brief introduction to the active learning problem, we discuss different query strategies for selecting informative data points and review how these strategies give rise to different variants of active learning with SVMs. For further resources related to this article, please visit the WIREs website. Conflict of interest: The authors have declared no conflicts of interest for this article.",

author = "Jan Kremer and Pedersen, {Kim Steenstrup} and Christian Igel",

year = "2014",

doi = "10.1002/widm.1132",

language = "English",

volume = "4",

pages = "313--326",

journal = "Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery",

issn = "1942-4787",

publisher = "JohnWiley & Sons Ltd",

number = "4",

}

TY - JOUR

T1 - Active learning with support vector machines

AU - Kremer, Jan

AU - Pedersen, Kim Steenstrup

AU - Igel, Christian

PY - 2014

Y1 - 2014

N2 - In machine learning, active learning refers to algorithms that autonomously select the data points from which they will learn. There are many data mining applications in which large amounts of unlabeled data are readily available, but labels (e.g., human annotations or results coming from complex experiments) are costly to obtain. In such scenarios, an active learning algorithm aims at identifying data points that, if labeled and used for training, would most improve the learned model. Labels are then obtained only for the most promising data points. This speeds up learning and reduces labeling costs. Support vector machine (SVM) classifiers are particularly well-suited for active learning due to their convenient mathematical properties. They perform linear classification, typically in a kernel-induced feature space, which makes expressing the distance of a data point from the decision boundary straightforward. Furthermore, heuristics can efficiently help estimate how strongly learning from a data point influences the current model. This information can be used to actively select training samples. After a brief introduction to the active learning problem, we discuss different query strategies for selecting informative data points and review how these strategies give rise to different variants of active learning with SVMs. For further resources related to this article, please visit the WIREs website. Conflict of interest: The authors have declared no conflicts of interest for this article.

AB - In machine learning, active learning refers to algorithms that autonomously select the data points from which they will learn. There are many data mining applications in which large amounts of unlabeled data are readily available, but labels (e.g., human annotations or results coming from complex experiments) are costly to obtain. In such scenarios, an active learning algorithm aims at identifying data points that, if labeled and used for training, would most improve the learned model. Labels are then obtained only for the most promising data points. This speeds up learning and reduces labeling costs. Support vector machine (SVM) classifiers are particularly well-suited for active learning due to their convenient mathematical properties. They perform linear classification, typically in a kernel-induced feature space, which makes expressing the distance of a data point from the decision boundary straightforward. Furthermore, heuristics can efficiently help estimate how strongly learning from a data point influences the current model. This information can be used to actively select training samples. After a brief introduction to the active learning problem, we discuss different query strategies for selecting informative data points and review how these strategies give rise to different variants of active learning with SVMs. For further resources related to this article, please visit the WIREs website. Conflict of interest: The authors have declared no conflicts of interest for this article.

U2 - 10.1002/widm.1132

DO - 10.1002/widm.1132

M3 - Review

SN - 1942-4787

VL - 4

SP - 313

EP - 326

JO - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

JF - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

IS - 4

ER -

Active learning with support vector machines

Abstract

Access to Document

Fingerprint

Cite this