Knowledge sharing for population based neural network training

Stefan Oehmcke; Oliver Kramer

doi:10.1007/978-3-030-00111-7_22

Knowledge sharing for population based neural network training

Stefan Oehmcke^*, Oliver Kramer

^*Corresponding author af dette arbejde

1 Citationer (Scopus)

Abstract

Finding good hyper-parameter settings to train neural networks is challenging, as the optimal settings can change during the training phase and also depend on random factors such as weight initialization or random batch sampling. Most state-of-the-art methods for the adaptation of these settings are either static (e.g. learning rate scheduler) or dynamic (e.g ADAM optimizer), but only change some of the hyper-parameters and do not deal with the initialization problem. In this paper, we extend the asynchronous evolutionary algorithm, population based training, which modifies all given hyper-parameters during training and inherits weights. We introduce a novel knowledge distilling scheme. Only the best individuals of the population are allowed to share part of their knowledge about the training data with the whole population. This embraces the idea of randomness between the models, rather than avoiding it, because the resulting diversity of models is important for the population’s evolution. Our experiments on MNIST, fashionMNIST, and EMNIST (MNIST split) with two classic model architectures show significant improvements to convergence and model accuracy compared to the original algorithm. In addition, we conduct experiments on EMNIST (balanced split) employing a ResNet and a WideResNet architecture to include complex architectures and data as well.

Originalsprog	Engelsk
Titel	KI 2018 : Advances in Artificial Intelligence - 41st German Conference on AI, 2018, Proceedings
Redaktører	Anni-Yasmin Turhan, Frank Trollmann
Antal sider	12
Forlag	Springer Verlag,
Publikationsdato	1 jan. 2018
Sider	258-269
ISBN (Trykt)	9783030001100
DOI	https://doi.org/10.1007/978-3-030-00111-7_22
Status	Udgivet - 1 jan. 2018
Begivenhed	41st German Conference on Artificial Intelligence, KI 2018 - Berlin, Tyskland Varighed: 24 sep. 2018 → 28 sep. 2018

Konference

Konference	41st German Conference on Artificial Intelligence, KI 2018
Land/Område	Tyskland
By	Berlin
Periode	24/09/2018 → 28/09/2018

Navn	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Vol/bind	11117 LNAI
ISSN	0302-9743

Adgang til dokumentet

10.1007/978-3-030-00111-7_22

Andre filer og links

Link to publication in Scopus

Citationsformater

Knowledge sharing for population based neural network training. / Oehmcke, Stefan; Kramer, Oliver.
KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI, 2018, Proceedings. red. / Anni-Yasmin Turhan; Frank Trollmann. Springer Verlag, 2018. s. 258-269 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bind 11117 LNAI).

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › peer review

Oehmcke, S & Kramer, O 2018, Knowledge sharing for population based neural network training. i A-Y Turhan & F Trollmann (red), KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI, 2018, Proceedings. Springer Verlag, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), bind 11117 LNAI, s. 258-269, 41st German Conference on Artificial Intelligence, KI 2018, Berlin, Tyskland, 24/09/2018. https://doi.org/10.1007/978-3-030-00111-7_22

Oehmcke S, Kramer O. Knowledge sharing for population based neural network training. I Turhan AY, Trollmann F, red., KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI, 2018, Proceedings. Springer Verlag,. 2018. s. 258-269. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bind 11117 LNAI). doi: 10.1007/978-3-030-00111-7_22

Oehmcke, Stefan ; Kramer, Oliver. / Knowledge sharing for population based neural network training. KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI, 2018, Proceedings. red. / Anni-Yasmin Turhan ; Frank Trollmann. Springer Verlag, 2018. s. 258-269 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bind 11117 LNAI).

@inproceedings{783bcc4fb7a042f38ca84c5546fbd7a4,

title = "Knowledge sharing for population based neural network training",

abstract = "Finding good hyper-parameter settings to train neural networks is challenging, as the optimal settings can change during the training phase and also depend on random factors such as weight initialization or random batch sampling. Most state-of-the-art methods for the adaptation of these settings are either static (e.g. learning rate scheduler) or dynamic (e.g ADAM optimizer), but only change some of the hyper-parameters and do not deal with the initialization problem. In this paper, we extend the asynchronous evolutionary algorithm, population based training, which modifies all given hyper-parameters during training and inherits weights. We introduce a novel knowledge distilling scheme. Only the best individuals of the population are allowed to share part of their knowledge about the training data with the whole population. This embraces the idea of randomness between the models, rather than avoiding it, because the resulting diversity of models is important for the population{\textquoteright}s evolution. Our experiments on MNIST, fashionMNIST, and EMNIST (MNIST split) with two classic model architectures show significant improvements to convergence and model accuracy compared to the original algorithm. In addition, we conduct experiments on EMNIST (balanced split) employing a ResNet and a WideResNet architecture to include complex architectures and data as well.",

keywords = "Asynchronous evolutionary algorithms, Hyper-parameter optimization, Population based training",

author = "Stefan Oehmcke and Oliver Kramer",

year = "2018",

month = jan,

day = "1",

doi = "10.1007/978-3-030-00111-7_22",

language = "English",

isbn = "9783030001100",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag,",

pages = "258--269",

editor = "Anni-Yasmin Turhan and Frank Trollmann",

booktitle = "KI 2018",

note = "41st German Conference on Artificial Intelligence, KI 2018 ; Conference date: 24-09-2018 Through 28-09-2018",

}

TY - GEN

T1 - Knowledge sharing for population based neural network training

AU - Oehmcke, Stefan

AU - Kramer, Oliver

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Finding good hyper-parameter settings to train neural networks is challenging, as the optimal settings can change during the training phase and also depend on random factors such as weight initialization or random batch sampling. Most state-of-the-art methods for the adaptation of these settings are either static (e.g. learning rate scheduler) or dynamic (e.g ADAM optimizer), but only change some of the hyper-parameters and do not deal with the initialization problem. In this paper, we extend the asynchronous evolutionary algorithm, population based training, which modifies all given hyper-parameters during training and inherits weights. We introduce a novel knowledge distilling scheme. Only the best individuals of the population are allowed to share part of their knowledge about the training data with the whole population. This embraces the idea of randomness between the models, rather than avoiding it, because the resulting diversity of models is important for the population’s evolution. Our experiments on MNIST, fashionMNIST, and EMNIST (MNIST split) with two classic model architectures show significant improvements to convergence and model accuracy compared to the original algorithm. In addition, we conduct experiments on EMNIST (balanced split) employing a ResNet and a WideResNet architecture to include complex architectures and data as well.

AB - Finding good hyper-parameter settings to train neural networks is challenging, as the optimal settings can change during the training phase and also depend on random factors such as weight initialization or random batch sampling. Most state-of-the-art methods for the adaptation of these settings are either static (e.g. learning rate scheduler) or dynamic (e.g ADAM optimizer), but only change some of the hyper-parameters and do not deal with the initialization problem. In this paper, we extend the asynchronous evolutionary algorithm, population based training, which modifies all given hyper-parameters during training and inherits weights. We introduce a novel knowledge distilling scheme. Only the best individuals of the population are allowed to share part of their knowledge about the training data with the whole population. This embraces the idea of randomness between the models, rather than avoiding it, because the resulting diversity of models is important for the population’s evolution. Our experiments on MNIST, fashionMNIST, and EMNIST (MNIST split) with two classic model architectures show significant improvements to convergence and model accuracy compared to the original algorithm. In addition, we conduct experiments on EMNIST (balanced split) employing a ResNet and a WideResNet architecture to include complex architectures and data as well.

KW - Asynchronous evolutionary algorithms

KW - Hyper-parameter optimization

KW - Population based training

UR - http://www.scopus.com/inward/record.url?scp=85054508337&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-00111-7_22

DO - 10.1007/978-3-030-00111-7_22

M3 - Article in proceedings

AN - SCOPUS:85054508337

SN - 9783030001100

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 258

EP - 269

BT - KI 2018

A2 - Turhan, Anni-Yasmin

A2 - Trollmann, Frank

PB - Springer Verlag,

T2 - 41st German Conference on Artificial Intelligence, KI 2018

Y2 - 24 September 2018 through 28 September 2018

ER -

Knowledge sharing for population based neural network training

Abstract

Konference

Adgang til dokumentet

Andre filer og links

Fingeraftryk

Citationsformater