10:00 – 10:30 Talk – John Wilmes (Georgia Tech)
10:30 – 11:00 Talk – Antonio Blanca (Georgia Tech)
11:00 – 12:00 Keynote by Robert Schapire (Microsoft Research NYC)
Lunch in Klaus Atrium
John Wilmes (Georgia Tech)
Title: The Complexity of Learning Neural Networks
The empirical successes of neural networks currently lack rigorous theoretical explanation. A first step might be to show that data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently. We demonstrate a surprisingly general lower bound. For inputs drawn from any logconcave distribution, we construct a family of functions that is hard to learn in the sense that any statistical query algorithm (including all known variants of stochastic gradient descent with any loss function, for any model architecture) needs an exponential number of queries even using tolerance inversely proportional to the input dimensionality. Moreover, this hard family of functions is realizable as neural networks using any activation function from a large family (including sigmoid and ReLU) with a small (sublinear in dimension) number of units in the single hidden layer, and whose output is a sum gate.
Joint work with Le Song, Santosh Vempala, and Bo Xie.
Antonio Blanca (Georgia Tech)
Title: Decay of correlations and non-local Markov chains
Abstract: In this talk we consider Markov chains for spin systems on the integer lattice graph Z^d. It has been well known since pioneering work from the early 1990’s that a certain decay of correlation property, known as strong spatial mixing (SSM), is a necessary and sufficient condition for fast mixing of the Gibbs sampler, where the state of a single vertex is updated in each step. In practice, non-local Markov chains are particularly popular from their potentially exponential speed-up, but these processes have largely resisted analysis. In this talk, we consider the effects of SSM on the rate of convergence to stationary of non-local Markov chains. We show that SSM implies fast mixing of several standard non-local chains, including general blocks dynamics, systematic scan dynamics and the Swendsen-Wang dynamics for the Ising/Potts model. Our proofs use a variety of techniques for the analysis of Markov chains including coupling, functional analysis and linear algebra.
Robert Schapire, Microsoft Research (NYC)
Title: The Contextual Bandits Problem: Techniques for Learning to Make High-Reward Decisions
Abstract: We consider how to learn through experience to make intelligent decisions. In the generic setting, called the contextual bandits problem, the learner must repeatedly decide which action to take in response to an observed context, and is then permitted to observe the received reward, but only for the chosen action. The goal is to learn to behave nearly as well as the best policy (or decision rule) in some possibly very large and rich space of candidate policies. This talk will describe progress on developing general methods for this problem and some of its variants.
Bio: Robert Schapire is a Principal Researcher at Microsoft Research in New York City. He received his PhD from MIT in 1991. After a short post-doc at Harvard, he joined the technical staff at AT&T Labs (formerly AT&T Bell Laboratories) in 1991. In 2002, he became a Professor of Computer Science at Princeton University. He joined Microsoft Research in 2014. His awards include the 1991 ACM Doctoral Dissertation Award, the 2003 Gödel Prize, and the 2004 Kanelakkis Theory and Practice Award (both of the last two with Yoav Freund). He is a fellow of the AAAI, and a member of both the National Academy of Engineering and the National Academy of Sciences. His main research interest is in theoretical and applied machine learning, with particular focus on boosting, online learning, game theory, and maximum entropy.