A bound for the convergence rate of parallel tempering for sampling restricted Boltzmann machines

Asja Fischer; Christian Igel

doi:10.1016/j.tcs.2015.05.019

A bound for the convergence rate of parallel tempering for sampling restricted Boltzmann machines

9 Citations (Scopus)

Abstract

Sampling from restricted Boltzmann machines (RBMs) is done by Markov chain Monte Carlo (MCMC) methods. The faster the convergence of the Markov chain, the more efficiently can high quality samples be obtained. This is also important for robust training of RBMs, which usually relies on sampling. Parallel tempering (PT), an MCMC method that maintains several replicas of the original chain at higher temperatures, has been successfully applied for RBM training. We present the first analysis of the convergence rate of PT for sampling from binary RBMs. The resulting bound on the rate of convergence of the PT Markov chain shows an exponential dependency on the size of one layer and the absolute values of the RBM parameters. It is minimized by a uniform spacing of the inverse temperatures, which is often used in practice. Similarly as in the derivation of bounds on the approximation error for contrastive divergence learning, our bound on the mixing time implies an upper bound on the error of the gradient approximation when the method is used for RBM training.

Original language	English
Journal	Theoretical Computer Science
Volume	598
Pages (from-to)	102-117
Number of pages	16
ISSN	0304-3975
DOIs	https://doi.org/10.1016/j.tcs.2015.05.019
Publication status	Published - 20 Sept 2015

Access to Document

10.1016/j.tcs.2015.05.019

Cite this

@article{02de52cc2ad44015b280cce1ffe485af,

title = "A bound for the convergence rate of parallel tempering for sampling restricted Boltzmann machines",

abstract = "Sampling from restricted Boltzmann machines (RBMs) is done by Markov chain Monte Carlo (MCMC) methods. The faster the convergence of the Markov chain, the more efficiently can high quality samples be obtained. This is also important for robust training of RBMs, which usually relies on sampling. Parallel tempering (PT), an MCMC method that maintains several replicas of the original chain at higher temperatures, has been successfully applied for RBM training. We present the first analysis of the convergence rate of PT for sampling from binary RBMs. The resulting bound on the rate of convergence of the PT Markov chain shows an exponential dependency on the size of one layer and the absolute values of the RBM parameters. It is minimized by a uniform spacing of the inverse temperatures, which is often used in practice. Similarly as in the derivation of bounds on the approximation error for contrastive divergence learning, our bound on the mixing time implies an upper bound on the error of the gradient approximation when the method is used for RBM training.",

keywords = "Restricted Boltzmann machines, Parallel tempering, Mixing time",

author = "Asja Fischer and Christian Igel",

year = "2015",

month = sep,

day = "20",

doi = "10.1016/j.tcs.2015.05.019",

language = "English",

volume = "598",

pages = "102--117",

journal = "Theoretical Computer Science",

issn = "0304-3975",

publisher = "Elsevier",

}

TY - JOUR

T1 - A bound for the convergence rate of parallel tempering for sampling restricted Boltzmann machines

AU - Fischer, Asja

AU - Igel, Christian

PY - 2015/9/20

Y1 - 2015/9/20

N2 - Sampling from restricted Boltzmann machines (RBMs) is done by Markov chain Monte Carlo (MCMC) methods. The faster the convergence of the Markov chain, the more efficiently can high quality samples be obtained. This is also important for robust training of RBMs, which usually relies on sampling. Parallel tempering (PT), an MCMC method that maintains several replicas of the original chain at higher temperatures, has been successfully applied for RBM training. We present the first analysis of the convergence rate of PT for sampling from binary RBMs. The resulting bound on the rate of convergence of the PT Markov chain shows an exponential dependency on the size of one layer and the absolute values of the RBM parameters. It is minimized by a uniform spacing of the inverse temperatures, which is often used in practice. Similarly as in the derivation of bounds on the approximation error for contrastive divergence learning, our bound on the mixing time implies an upper bound on the error of the gradient approximation when the method is used for RBM training.

AB - Sampling from restricted Boltzmann machines (RBMs) is done by Markov chain Monte Carlo (MCMC) methods. The faster the convergence of the Markov chain, the more efficiently can high quality samples be obtained. This is also important for robust training of RBMs, which usually relies on sampling. Parallel tempering (PT), an MCMC method that maintains several replicas of the original chain at higher temperatures, has been successfully applied for RBM training. We present the first analysis of the convergence rate of PT for sampling from binary RBMs. The resulting bound on the rate of convergence of the PT Markov chain shows an exponential dependency on the size of one layer and the absolute values of the RBM parameters. It is minimized by a uniform spacing of the inverse temperatures, which is often used in practice. Similarly as in the derivation of bounds on the approximation error for contrastive divergence learning, our bound on the mixing time implies an upper bound on the error of the gradient approximation when the method is used for RBM training.

KW - Restricted Boltzmann machines

KW - Parallel tempering

KW - Mixing time

U2 - 10.1016/j.tcs.2015.05.019

DO - 10.1016/j.tcs.2015.05.019

M3 - Journal article

SN - 0304-3975

VL - 598

SP - 102

EP - 117

JO - Theoretical Computer Science

JF - Theoretical Computer Science

ER -

A bound for the convergence rate of parallel tempering for sampling restricted Boltzmann machines

Abstract

Access to Document

Fingerprint

Cite this