Abstract
Accurate estimates of the normalization constants (partition functions) of energy-based probabilistic models (Markov random fields) are highly important, for example, for assessing the performance of models, monitoring training progress, and conducting likelihood ratio tests. Several algorithms for estimating the partition function (in relation to a reference distribution) have been introduced, including Annealed Importance Sampling (AIS) and Bennett's Acceptance Ratio method (BAR). However, their conceptual similarities and differences have not been worked out so far and systematic comparisons of their behavior in practice have been missing. We devise a unifying theoretical framework for these algorithms, which comprises existing variants and suggests new approaches. It is based on a generalized form of Crooks' equality linking the expectation over a distribution of samples generated by a transition operator to the expectation over the distribution induced by the reversed operator. The framework covers different ways of generating samples, such as parallel tempering and path sampling. An empirical comparison revealed the differences between the methods when estimating the partition function of restricted Boltzmann machines and Ising models. In our experiments, BAR using parallel tempering worked well with a small number of bridging distributions, while path sampling based AIS performed best when many bridging distributions were available. Because BAR gave the overall best results, we favor it over AIS. Furthermore, the experiments showed the importance of choosing a proper reference distribution.
Original language | English |
---|---|
Article number | 103195 |
Journal | Artificial Intelligence |
Volume | 278 |
ISSN | 0004-3702 |
DOIs | |
Publication status | Published - Jan 2020 |
Keywords
- Annealed importance sampling
- Bennett's acceptance ratio
- Bridge sampling
- Crooks' equality
- Ising model
- Parallel tempering
- Partition function estimation
- Restricted Boltzmann machines