Title: The Big Data Theory and Randomized Algorithms
Advances in machine learning and developments in modern massive parallel computational systems motivate a large number of challenging questions for the theory of computing. How to make sense of massive amounts of noisy data? How to cluster billions of points and compare large images? How to design scalable algorithms for distributed systems?
I will present examples of settings in which these questions can be addressed through the design of randomized algorithms:
- Sublinear algorithms for testing properties of high-dimensional noisy data such (e.g. monotonicity, the Lipschitz property and convexity)
- Distributed algorithms for geometric problems in Euclidean space (minimum cost spanning tree, Earth-Mover Distance)
Through these examples I will illustrate how information-theoretic measures of performance such as query complexity and the number of rounds of communication play a key role in the design and analysis of such algorithms. I will show that even in the worst-case these fundamental problems can be solved almost optimally with respect to these measures. The information-theoretic nature is crucial here since lower bounds can be proved unconditionally, i.e. with no computational hardness assumptions.
This talk will be primarily based on two papers which will appear in STOC’14 (joint work with Andoni, Berman, Nikolov, Onak and Raskhodnikova) as well as other work by the speaker.