High-school dropout prediction using machine learning: a Danish large-scale study

Nicolae-Bogdan Şara, Rasmus Halland, Christian Igel, Stephen Alstrup

21 Citationer (Scopus)
434 Downloads (Pure)

Abstract

Pupils not finishing their secondary education are a big societal problem. Previous studies indicate that machine learning can be used to predict high-school dropout, which allows early interventions. To the best of our knowledge, this paper presents the first large-scale study of that kind. It considers pupils that were at least six months into their Danish high-school education, with the goal to predict dropout in the subsequent three months. We combined information from the MaCom Lectio study administration system, which is used by most Danish high schools, with data from public online sources (name database, travel planner, governmental statistics). In contrast to existing studies that were based on only a few hundred students, we considered a considerably larger sample of 36299 pupils for training and 36299 for testing. We evaluated different machine learning methods. A random forest classifier achieved an accuracy of 93.47% and an area under the curve of 0.965. Given the large sample, we conclude that machine learning can be used to reliably detect high-school dropout given the information already available to many schools.

OriginalsprogEngelsk
TitelProceedings. ESANN 2015 : 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
RedaktørerMichel Verleysen
Antal sider6
Forlagi6doc.com
Publikationsdato2015
Sider319-324
ISBN (Trykt)978-2-87587-014-8
ISBN (Elektronisk)978-2-87587-015-5
StatusUdgivet - 2015
Begivenhed23rd European Symposium on Artificial Neural Networks - Bruges, Belgien
Varighed: 22 apr. 201524 apr. 2015
Konferencens nummer: 23

Konference

Konference23rd European Symposium on Artificial Neural Networks
Nummer23
Land/OmrådeBelgien
ByBruges
Periode22/04/201524/04/2015

Fingeraftryk

Dyk ned i forskningsemnerne om 'High-school dropout prediction using machine learning: a Danish large-scale study'. Sammen danner de et unikt fingeraftryk.

Citationsformater