High-school dropout prediction using machine learning: a Danish large-scale study

Nicolae-Bogdan Şara, Rasmus Halland, Christian Igel, Stephen Alstrup

21 Citations (Scopus)
434 Downloads (Pure)

Abstract

Pupils not finishing their secondary education are a big societal problem. Previous studies indicate that machine learning can be used to predict high-school dropout, which allows early interventions. To the best of our knowledge, this paper presents the first large-scale study of that kind. It considers pupils that were at least six months into their Danish high-school education, with the goal to predict dropout in the subsequent three months. We combined information from the MaCom Lectio study administration system, which is used by most Danish high schools, with data from public online sources (name database, travel planner, governmental statistics). In contrast to existing studies that were based on only a few hundred students, we considered a considerably larger sample of 36299 pupils for training and 36299 for testing. We evaluated different machine learning methods. A random forest classifier achieved an accuracy of 93.47% and an area under the curve of 0.965. Given the large sample, we conclude that machine learning can be used to reliably detect high-school dropout given the information already available to many schools.

Original languageEnglish
Title of host publicationProceedings. ESANN 2015 : 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
EditorsMichel Verleysen
Number of pages6
Publisheri6doc.com
Publication date2015
Pages319-324
ISBN (Print)978-2-87587-014-8
ISBN (Electronic)978-2-87587-015-5
Publication statusPublished - 2015
Event23rd European Symposium on Artificial Neural Networks - Bruges, Belgium
Duration: 22 Apr 201524 Apr 2015
Conference number: 23

Conference

Conference23rd European Symposium on Artificial Neural Networks
Number23
Country/TerritoryBelgium
CityBruges
Period22/04/201524/04/2015

Fingerprint

Dive into the research topics of 'High-school dropout prediction using machine learning: a Danish large-scale study'. Together they form a unique fingerprint.

Cite this