Virtual Event, January-February, 2021
This year AABI will be a virtual event consisting of six weekly online seminars held in January through February, 2021. Each seminar will be broadcast via Zoom and in the meantime live-streamed on AABI Youtube channel. The Zoom registration is free but will be limited. Note that each seminar needs to be registered individually.
Invited | Michael I. Jordan: On the Theory of Gradient-Based Optimization and Sampling: A View from Continuous Time
Abstract: Gradient-based optimization has provided the theoretical and practical foundations on which recent developments in statistical machine learning have reposed. A complementary set of foundations is provided by Monte Carlo sampling, where gradient-based methods have also been leading the way in recent years. We explore links between gradient-based optimization algorithms and gradient-based sampling algorithms. Although these algorithms are generally studied in discrete time, we find that fundamental insights can be obtained more readily if we work in continuous time. Results that I will cover include: (1) there is a counterpart of Nesterov acceleration in the world of Langevin diffusion; (2) Langevin algorithms can converge quickly enough to give logarithmic regret in Bayesian multi-arm bandits; and (3) symplectic integration conserves rates of convergence from continuous time to discrete time. |
Invited | Maja Rudolph: Variational Dynamic Mixtures
Abstract: Many deep probabilistic time series models struggle with sequences with multi-modal dynamics. While powerful generative models have been developed, we show evidence that the associated approximate inference methods are usually too restrictive and can lead to mode averaging. Mode averaging is problematic in highly multi-modal real world sequences, as it can result in unphysical predictions (e.g., predicted taxi trajectories might run through buildings on the street map if they average between the options to go either right or left). This talk is about variational dynamic mixtures (VDM): a new variational family to infer sequential latent variables with multi-modal dynamics. The VDM approximate posterior at each time step is a mixture density network, whose parameters come from propagating multiple samples through a recurrent architecture. This results in an expressive multi-modal posterior approximation. In an empirical study, we show that VDM outperforms competing approaches on highly multi-modal datasets from different domains. |
Contributed | Wessel Bruinsma: The Gaussian Neural Process
Paper
Talk
Tomas Geffner: Empirical Evaluation of Biased Methods for Alpha Divergence Minimization Paper Talk |
Invited | Justin Domke: Some Embarrassing Questions about Variational Inference
Abstract: Black-box variational inference solves ever-more complex models at ever-higher scale. Yet we can’t answer some basic questions: What’s the best way to estimate the gradient? Can we ensure optimization doesn’t diverge? Is it realistic to optimize other divergences? What’s happening when we integrate Monte Carlo estimators the objective? In this talk I’ll explain why these questions aren’t as innocent as they seem, along with some partial progress towards answers. |
Contributed | Emiel Hoogeboom: Argmax Flows: Learning Categorical Distributions with Normalizing Flows
Paper
Talk
Erik Daxberger: Expressive yet Tractable Bayesian Deep Learning via Subnetwork Inference Paper Talk |
Invited | Fredrik Lindsten: Sequential Monte Carlo for Approximate Bayesian Inference
Abstract: Sequential Monte Carlo (SMC) is a powerful class of methods for approximate Bayesian inference. While originally used mainly for signal processing and inference in dynamical systems, these methods are in fact much more general and can be used to solve many challenging problems in Bayesian statistics and machine learning, even if they lack apparent sequential structure. In this talk I will first discuss the foundations of SMC from a machine learning perspective. We will see that there are two main design choices of SMC: the proposal distribution and the so-called intermediate target distributions, where the latter is often overlooked in practice. Focusing on graphical model inference, I will then show how deterministic approximations, such as variational inference and expectation propagation, can be used to approximate the optimal intermediate target distributions. The resulting algorithm can be viewed as a post-correction of the biases associated with these deterministic approximations. Numerical results show improvements over the baseline deterministic methods as well as over “plain” SMC. The first part of the talk is an introduction to SMC inspired by our recent Foundations and Trends tutorial. The second part of the talk, focusing on combining SMC and deterministic approximations for graphical model inference, is based on this paper. |
Contributed | Jae Hyun Lim: Bijective-Contrastive Estimation
Paper
Talk
Nikolai Zaki: Evidence Estimation by Kullback-Leibler Integration for Flow-Based Methods Paper Talk |
Invited | Mijung Park: ABCDP: Approximate Bayesian Computation & Differential Privacy
Abstract: We develop a novel approximate Bayesian computation framework, called ABCDP, that produces differentially private posterior samples. Our framework requires minimal modification to existing ABC algorithms. We theoretically analyze the interplay between the noise added for the privacy guarantee and the accuracy of the ABC posterior samples. We apply ABCDP to simulated data as well as privacy-sensitive real data and show the efficacy of the proposed framework. |
Contributed | Hao Wu: Conjugate Energy-Based Models
Paper
Talk
William Tebbutt: Combining Pseudo-Point and State Space Approximations for Sum-Separable Gaussian Processes Paper Talk |
Invited | Jascha Sohl-Dickstein: Infinite Width Bayesian Neural Networks
Abstract: As neural networks become wider their accuracy improves, and their behavior becomes easier to analyze theoretically. I will give an introduction to a rapidly growing body of work which examines the distribution over functions induced by infinitely wide, randomly initialized, neural networks. Core results that I will discuss include: that the distribution over functions computed by a wide neural network often corresponds to a Gaussian process with a particular compositional kernel; that the predictions of wide neural networks are linear in their parameters throughout training; that this perspective enables analytic predictions for how trainability of finite width networks depends on hyperparameters and architecture; and finally that results on infinite width networks can enable efficient posterior sampling from finite width Bayesian networks. These results provide for surprising capabilities -- for instance, the evaluation of test set predictions which would come from an infinitely wide Bayesian or gradient-descent-trained trained neural network without ever instantiating a neural network, or the rapid training of 10,000+ layer convolutional networks. I will argue that this growing understanding of neural networks in the limit of infinite width is foundational for future theoretical and practical understanding of deep learning. |
Contributed | Matthew Hoffman: Roundoff Error in Metropolis-Hastings Accept-Reject Steps
Paper
Talk
Alex Alemi: VIB is Half Bayes Paper Talk |