The workshop's recording is available on Youtube.
|8:30 - 8:35
|8:35 - 9:00
|Surya Ganguli: Learning deep generative models by reversing diffusion
|9:00 - 9:15
|Guillaume Dehaene: Expectation Propagation performs a smoothed gradient descent Slides [pptx]
|9:15 - 9:40
|Matthew Johnson: Learning representations that support efficient inference Slides Slides [key]
|9:40 - 10:30
Aki Vehtari, Daniel Ritchie, Dustin Tran, Ryan Sepassi, Michael Hughes
Moderator: Trevor Campbell
|11:00 - 11:15
|James McInerney: B3O: Bayes Empirical Bayes by Bayesian Optimization
|11:15 - 11:35
Volodymyr Kuleshov: Neural Variational Random Field Learning Slides
Jaan Altosaar: Proximity Variational Inference Slides
Qiang Liu: Stein Variational Gradient Descent: Theory and Applications Slides
Hans-Christian Ruiz Euler: Smoothing Estimates of Diffusion Processes Slides
David A. Moore: Symmetrized Variational Inference Slides [pptx]
Francisco Ruiz: Rejection Sampling Variational Inference Slides
|11:35 - 1:00
|2:10 - 2:35
|Barbara Engelhardt: Variational inference in high dimensional analyses Slides
|2:35 - 3:00
|Jeffrey Regier: Learning an astronomical catalog of the visible universe through scalable Bayesian inference Slides
|3:30 - 3:45
|Matt Hoffman: Inference and Introspection in Deep Generative Models of Sparse Data Slides
|3:45 - 4:10
|Jonathan Huggins: Coresets for Scalable Bayesian Inference Slides [key]
|4:10 - 4:25
|Matthew Graham: Continuously tempered Hamiltonian Monte Carlo
|4:25 - 5:30
|Panel: On the Foundations and
Future of Approximate Inference
Ryan Adams, Barbara Engelhardt, Philipp Hennig, Richard Turner
Moderator: David Blei
Abstract. A large variety of methods have been developed to train deep generative models of complex datasets. A common ingredient in these methods involves joint training of a recognition model that converts complex data distributions into simple ones, and a generative model that converts the simple distribution into the complex data distribution. Here we discuss a particularly simple approach, inspired from non-equilibrium thermodynamics, in which the recognition model diffusively destroys structure in data, and the generative model is a deep neural network. This leads to a framework in which neural networks essentially reverse the arrow of time in an entropy generating diffusive process. This combined framework allows for learning highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Moreover, the simplicity of the recognition process makes this framework an ideal starting point for the theoretical analysis of unsupervised learning through deep generative modeling.
Abstract. Learned inference networks, like those in variational autoencoders, provide fast approximate inference in flexible, high-capacity models, but can only answer a few specific inference queries, and may become data-intensive to learn as latent variable models grow more complex. At another extreme, probabilistic graphical models built with exponential family structure let us constrain our models so as to enable efficient, compositional inference, but models built from only these tractable components be too restrictive for complex data like images and video. I'll compare these approaches and describe one way we can combine their strengths, yielding a framework in which neural networks help us learn latent variable representations that support efficient inference.
Abstract. High dimensional data requires structured statistical models for appropriate exploratory analyses; however, the associated approximate inference methods often are not robust to random initializations. This talk considers the question of how to combine results from approximate inference methods across specific models to make the results more robust.
Abstract. A central problem in astronomy is to infer locations, colors, and other properties of stars and galaxies appearing in astronomical images. In the first part of this talk, I present a generative model for astronomical images. The number of photons arriving at each pixel during an exposure is Poisson distributed, with a rate parameter that is a deterministic function of the latent properties of the imaged stars and galaxies. A variational Bayes procedure approximates the posterior distribution of the latent properties, attaining state-of-the-art results. In the second part of the talk, I report on scaling the procedure to a 55 TB collection of images. Our implementation is written entirely in Julia, a new high-level dynamic programming language. Using shared and distributed memory parallelism, we demonstrate effective load balancing and scaling on up to 16,384 Xeon cores on the NERSC Cori supercomputer.
Abstract. The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. Standard Bayesian inference algorithms are computationally expensive, however, making their direct application to large datasets difficult or infeasible. Recent work on scaling Bayesian inference has focused on modifying the underlying algorithms to, for example, use only a random data subsample at each iteration. I leverage the insight that data is often redundant to instead obtain a weighted subset of the data (called a coreset) that is much smaller than the original dataset. One can then use this small coreset in any number of existing posterior inference algorithms without modification. In this talk, I discuss an efficient coreset construction algorithm, which involves calculating a likelihood-specific importance distribution over the data, then subsampling and re-weighting the data using that distribution. The algorithm leads to an approximation of the log-likelihood up to a multiplicative error. Crucially for the large-scale data setting, the proposed approach permits efficient construction of coresets in both streaming and parallel settings. I show how to apply the algorithm to construct coresets for Bayesian logistic regression models. I give theoretical guarantees on the size and approximation quality of the coreset -- both for fixed, known datasets, and in expectation for a wide class of data generative models. I demonstrate the efficacy of the approach on a number of synthetic and real-world datasets, and find that, in practice, the size of the coreset is independent of the original dataset size. To conclude, I will discuss shortcomings of the multiplicative approximation guarantee provided by the coreset construction algorithm and why it is not ideal for the Bayesian setting. I propose an alternative approximation guarantee that is better suited for obtaining high-quality Bayesian inferences.