## 4th Symposium onAdvances in Approximate Bayesian Inference

Virtual Event, February 1st and 2nd, 2022

This year AABI will be a virtual event consisting of two days online seminars held in February 1st-2nd, 2022. Each seminar will be broadcast via Zoom and in the meantime live-streamed on the AABI 2022 Youtube channel. The Zoom registration is free but will be limited.

### Day 1 (Feb 1st)

 4:00-4:40 pm GMT Invited Aki Vehtari: Pareto-k as practical pre-asymptotic diagnostic of Monte Carlo estimates Abstract: I discuss the use of the Pareto-k diagnostic as a simple and practical approach for estimating pre-asymptotic reliability of Monte Carlo estimates, with examples in importance sampling, stochastic optimization, and variational inference. 4:40-5:00 pm GMT Contributed Bayesian Learning via Neural Schrödinger-Föllmer Flows 5:00-5:40 pm GMT Invited Pavel Izmailov: What Are Bayesian Neural Network Posteriors Really Like? Abstract: The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as mean-field variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full-batch Hamiltonian Monte Carlo (HMC) on modern architectures. We show that (1) BNNs can achieve significant performance gains over standard training and deep ensembles; (2) a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains; (3) in contrast to recent studies, we find posterior tempering is not needed for near-optimal performance, with little evidence for a "cold posterior" effect, which we show is largely an artifact of data augmentation; (4) BMA performance is robust to the choice of prior scale, and relatively similar for diagonal Gaussian, mixture of Gaussian, and logistic priors; (5) Bayesian neural networks show surprisingly poor generalization under domain shift; we demonstrate, explain and provide remedies for this effect; (6) while cheaper alternatives such as deep ensembles and SGMCMC methods can provide good generalization, they provide distinct predictive distributions from HMC. Notably, deep ensemble predictive distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
 5:40-6:00 pm GMT Coffee Break
 6:00-6:40 pm GMT Invited Lester Mackey: Kernel Thinning and Stein Thinning Abstract: This talk will introduce two new tools for summarizing a probability distribution more effectively than independent sampling or standard Markov chain Monte Carlo thinning: 1. Given an initial n point summary (for example, from independent sampling or a Markov chain), kernel thinning finds a subset of only square-root n points with comparable worst-case integration error across a reproducing kernel Hilbert space. 2. If the initial summary suffers from biases due to off-target sampling, tempering, or burn-in, Stein thinning simultaneously compresses the summary and improves the accuracy by correcting for these biases. These tools are especially well-suited for tasks that incur substantial downstream computation costs per summary point like organ and tissue modeling in which each simulation consumes 1000s of CPU hours. 6:40-7:00 pm GMT Contributed Linearised Laplace Inference in Networks with Normalisation Layers and the Neural g-Prior 7:00-7:20 pm GMT Contributed Sampling with Mirror Stein Operators
 7:30-9:00 pm GMT Poster Session Please join Gathertown here.

### Day 2 (Feb 2nd)

 2:00-2:40 pm GMT Invited Pierre Alquier: What can we expect from PAC-Bayes bounds? Abstract: PAC-Bayes bounds were developed to understand the generalization ability of randomized predictors, ensemble methods and Bayesian machine learning algorithms. However, a naive application of these bounds to sophisticated algorithms usually leads to vacuous generalization certificates. Many improvements were proposed in the past few years to obtain non-vacuous guarantees. Recently some very tight certificates were obtained. However, some ideas beyond these improvements are not totally understood. In this talk, I will illustrate with very simple examples what can go very wrong with PAC-Bayes bounds. I will then discuss how to fix these issues by choosing better priors. This will also highlight a deep connection to the recent literature on Mutual Information bounds. In some models, this leads to a clear view of how tight the certificates obtained from PAC-Bayes bounds can be. 2:40-3:00 pm GMT Contributed Sliced Wasserstein Variational Inference 3:00-3:40 pm GMT Invited Yixin Wang: Posterior Collapse and Latent Variable Non-identifiability Abstract: Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we show that posterior collapse is a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data. This is joint work with David Blei and John Cunningham.
 3:40-4:00 pm GMT Coffee Break
 4:00-4:40 pm GMT Invited Kunal Talwar: Privacy Amplification by Shuffling Abstract: Traditionally, Differential Privacy has been studied in two models: the local model which requires little trust assumptions, and the central model which needs a trusted curator and can achieve better utility. This talk will be about recent works showing that random shuffling amplifies differential privacy guarantees of locally randomized data. Such amplification implies substantially stronger privacy guarantees for systems in which data is contributed anonymously and allows us to get the strong utility of the central model without a trusted curator, as long as we can implement a secure shuffler. We show that random shuffling of $n$ data records that are input to $\eps_0$-differentially private local randomizers results in an $(O((\sqrt{\frac{e^{\eps_0}\log(1/\delta)}{n}}), \delta)$-differentially private algorithm. This significantly improves over previous work and achieves the asymptotically optimal dependence in $\eps_0$. Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees. Importantly, our work also yields an algorithm for deriving tighter bounds on the resulting $\eps$ and $\delta$ as well as R\'enyi differential privacy guarantees. We show numerically that our algorithm gets to within a small constant factor of the optimal bound. As a direct corollary of our analysis we derive a simple and nearly optimal algorithm for frequency estimation in the shuffle model of privacy. We also observe that our result implies the first asymptotically optimal privacy analysis of noisy stochastic gradient descent that applies to sampling without replacement. 4:40-5:00 pm GMT Contributed Deep Reference Priors: What is the best way to pretrain a model? 5:00-5:20 pm GMT Contributed Fast Finite Width Neural Tangent Kernel
 5:30-6:30 pm GMT Panel Michael Betancourt, Jan-Willem van de Meent, Chris J. Maddison, Karen Ullrich, Justin Domke Moderator: Stephan Mandt