Approximate Inference

4th Symposium on
Advances in Approximate Bayesian Inference

Virtual Event, February 1st and 2nd, 2022

This year AABI will be a virtual event consisting of two days online seminars held in February 1st-2nd, 2022. Each seminar will be broadcast via Zoom and in the meantime live-streamed on the AABI 2022 Youtube channel. The Zoom registration is free but will be limited.

Day 1 (Feb 1st)

4:00-4:40 pm GMT	Invited	Aki Vehtari: Pareto-k as practical pre-asymptotic diagnostic of Monte Carlo estimates Video Abstract: I discuss the use of the Pareto-k diagnostic as a simple and practical approach for estimating pre-asymptotic reliability of Monte Carlo estimates, with examples in importance sampling, stochastic optimization, and variational inference.
4:40-5:00 pm GMT	Contributed	Bayesian Learning via Neural Schrödinger-Föllmer Flows Video
5:00-5:40 pm GMT	Invited	Pavel Izmailov: What Are Bayesian Neural Network Posteriors Really Like? Video Abstract: The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as mean-field variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full-batch Hamiltonian Monte Carlo (HMC) on modern architectures. We show that (1) BNNs can achieve significant performance gains over standard training and deep ensembles; (2) a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains; (3) in contrast to recent studies, we find posterior tempering is not needed for near-optimal performance, with little evidence for a "cold posterior" effect, which we show is largely an artifact of data augmentation; (4) BMA performance is robust to the choice of prior scale, and relatively similar for diagonal Gaussian, mixture of Gaussian, and logistic priors; (5) Bayesian neural networks show surprisingly poor generalization under domain shift; we demonstrate, explain and provide remedies for this effect; (6) while cheaper alternatives such as deep ensembles and SGMCMC methods can provide good generalization, they provide distinct predictive distributions from HMC. Notably, deep ensemble predictive distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.

5:40-6:00 pm GMT

Coffee Break

6:00-6:40 pm GMT	Invited	Lester Mackey: Kernel Thinning and Stein Thinning Video Abstract: This talk will introduce two new tools for summarizing a probability distribution more effectively than independent sampling or standard Markov chain Monte Carlo thinning: 1. Given an initial n point summary (for example, from independent sampling or a Markov chain), kernel thinning finds a subset of only square-root n points with comparable worst-case integration error across a reproducing kernel Hilbert space. 2. If the initial summary suffers from biases due to off-target sampling, tempering, or burn-in, Stein thinning simultaneously compresses the summary and improves the accuracy by correcting for these biases. These tools are especially well-suited for tasks that incur substantial downstream computation costs per summary point like organ and tissue modeling in which each simulation consumes 1000s of CPU hours.
6:40-7:00 pm GMT	Contributed	Linearised Laplace Inference in Networks with Normalisation Layers and the Neural g-Prior Video
7:00-7:20 pm GMT	Contributed	Sampling with Mirror Stein Operators Video

7:30-9:00 pm GMT

Poster Session
Please join Gathertown here.

Day 2 (Feb 2nd)

2:00-2:40 pm GMT	Invited	Pierre Alquier: What can we expect from PAC-Bayes bounds? Video Abstract: PAC-Bayes bounds were developed to understand the generalization ability of randomized predictors, ensemble methods and Bayesian machine learning algorithms. However, a naive application of these bounds to sophisticated algorithms usually leads to vacuous generalization certificates. Many improvements were proposed in the past few years to obtain non-vacuous guarantees. Recently some very tight certificates were obtained. However, some ideas beyond these improvements are not totally understood. In this talk, I will illustrate with very simple examples what can go very wrong with PAC-Bayes bounds. I will then discuss how to fix these issues by choosing better priors. This will also highlight a deep connection to the recent literature on Mutual Information bounds. In some models, this leads to a clear view of how tight the certificates obtained from PAC-Bayes bounds can be.
2:40-3:00 pm GMT	Contributed	Sliced Wasserstein Variational Inference Video
3:00-3:40 pm GMT	Invited	Yixin Wang: Posterior Collapse and Latent Variable Non-identifiability Video Abstract: Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we show that posterior collapse is a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data. This is joint work with David Blei and John Cunningham.

3:40-4:00 pm GMT

Coffee Break

4:00-4:40 pm GMT	Invited	Kunal Talwar: Privacy Amplification by Shuffling Video Abstract: Traditionally, Differential Privacy has been studied in two models: the local model which requires little trust assumptions, and the central model which needs a trusted curator and can achieve better utility. This talk will be about recent works showing that random shuffling amplifies differential privacy guarantees of locally randomized data. Such amplification implies substantially stronger privacy guarantees for systems in which data is contributed anonymously and allows us to get the strong utility of the central model without a trusted curator, as long as we can implement a secure shuffler. We show that random shuffling of $n$ data records that are input to $\eps_0$-differentially private local randomizers results in an $(O((\sqrt{\frac{e^{\eps_0}\log(1/\delta)}{n}}), \delta)$-differentially private algorithm. This significantly improves over previous work and achieves the asymptotically optimal dependence in $\eps_0$. Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees. Importantly, our work also yields an algorithm for deriving tighter bounds on the resulting $\eps$ and $\delta$ as well as R\'enyi differential privacy guarantees. We show numerically that our algorithm gets to within a small constant factor of the optimal bound. As a direct corollary of our analysis we derive a simple and nearly optimal algorithm for frequency estimation in the shuffle model of privacy. We also observe that our result implies the first asymptotically optimal privacy analysis of noisy stochastic gradient descent that applies to sampling without replacement.
4:40-5:00 pm GMT	Contributed	Deep Reference Priors: What is the best way to pretrain a model? Video
5:00-5:20 pm GMT	Contributed	Fast Finite Width Neural Tangent Kernel Video

5:30-6:30 pm GMT

Panel Video
Michael Betancourt, Jan-Willem van de Meent, Chris J. Maddison, Karen Ullrich, Justin Domke
Moderator: Stephan Mandt

4th Symposium onAdvances in Approximate Bayesian Inference

Day 1 (Feb 1st)

Day 2 (Feb 2nd)

4th Symposium on
Advances in Approximate Bayesian Inference