Abstract
Pupils not finishing their secondary education are a big societal problem. Previous studies indicate that machine learning can be used to predict high-school dropout, which allows early interventions. To the best of our knowledge, this paper presents the first large-scale study of that kind. It considers pupils that were at least six months into their Danish high-school education, with the goal to predict dropout in the subsequent three months. We combined information from the MaCom Lectio study administration system, which is used by most Danish high schools, with data from public online sources (name database, travel planner, governmental statistics). In contrast to existing studies that were based on only a few hundred students, we considered a considerably larger sample of 36299 pupils for training and 36299 for testing. We evaluated different machine learning methods. A random forest classifier achieved an accuracy of 93.47% and an area under the curve of 0.965. Given the large sample, we conclude that machine learning can be used to reliably detect high-school dropout given the information already available to many schools.
Original language | English |
---|---|
Title of host publication | Proceedings. ESANN 2015 : 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning |
Editors | Michel Verleysen |
Number of pages | 6 |
Publisher | i6doc.com |
Publication date | 2015 |
Pages | 319-324 |
ISBN (Print) | 978-2-87587-014-8 |
ISBN (Electronic) | 978-2-87587-015-5 |
Publication status | Published - 2015 |
Event | 23rd European Symposium on Artificial Neural Networks - Bruges, Belgium Duration: 22 Apr 2015 → 24 Apr 2015 Conference number: 23 |
Conference
Conference | 23rd European Symposium on Artificial Neural Networks |
---|---|
Number | 23 |
Country/Territory | Belgium |
City | Bruges |
Period | 22/04/2015 → 24/04/2015 |