MSR/MIT Theory Day - Friday, June 20th Location: Barton Room (first floor) Microsoft Research, One Memorial Drive 10:30-11:15: Nikhil Bansal, "Minimizing Flow Time on Unrelated Machines" 11:15-12:00: Aaron Roth, "Correctness Protection via Differential Privacy" 12:00-1:00: Lunch Break 1:00-1:45: Ankur Moitra, "New Algorithms for Dictionary Learning" 1:45-2:30: David Woodruff, "Turnstile Streaming Algorithms Might as Well Be Linear Sketches" 2:30-3:00: Coffee Break 3:00-3:45: James Lee, "Anti-concentration of Smoothed Functions and Talagrand's Convolution Conjecture" ---------------------------------------------------------------------------------- Minimizing Flow Time on Unrelated Machines Nikhil Bansal, Eindhoven University The flow time of a job is defined as the amount of time the job spends in the system, i.e. its completion time minus its arrival time, and is a widely studied measure of quality in scheduling. I will describe the first poly-logarithmic approximation algorithms for the problems of minimizing (i) the total flow time and (ii) the maximum flow time on unrelated machines. The main idea is to consider a new LP relaxation for these problems that has much fewer constraints, and apply iterated rounding. Joint work with Janardhan Kulkarni ---------------------------------------------------------------------------------- Correctness Protection via Differential Privacy Aaron Roth, UPenn False discovery is a growing problem in scientific research. Despite sophisticated statistical techniques for controlling the false discovery rate and related statistics designed to protect against spurious discoveries, there is significant evidence that many published scientific papers contain incorrect conclusions. In this talk we consider the role that adaptivity has in this problem. A fundamental disconnect between the theorems that control false discovery rate and the practice of science is that the theorems assume a fixed collection of hypotheses to be tested, selected non-adaptively before the data is gathered, whereas science is by definition an adaptive process, in which data is shared and re-used, while hypotheses are generated after seeing the results of previous tests. We note that false discovery cannot be prevented when a substantial number of adaptive queries are made to the data, and data is used naively -- i.e. when queries are answered exactly with their empirical estimates on a given finite data set. However we show that remarkably, there is a different way to evaluate statistical queries on a data set that allows even an adaptive analyst to make exponentially many queries to the data set, while guaranteeing that with high probability, all of the conclusions he draws generalize to the underlying distribution. This technique counter-intuitively involves actively perturbing the answers given to the data analyst, using techniques developed for privacy preservation -- but in our application, the perturbations are added entirely to increase the utility of the data. Joint work with Cynthia Dwork, Vitaly Feldman, Moritz Hardt, and Omer Reingold. ---------------------------------------------------------------------------------- New Algorithms for Dictionary Learning Ankur Moitra, MIT Sparse representations play an essential role in many fields including statistics, signal processing and machine learning. But can we efficiently learn a basis that enables a sparse representation, if one exists? This problem has many names, and is often called dictionary learning or sparse coding. Here we study a stochastic model for this problem where there is an unknown $n \times m$ dictionary $A$ and our goal is to learn $A$ from random examples of the form $Y = AX$ where $X$ is sparse and is drawn from an appropriate distribution. Recently Spielman, Wang and Wright gave an algorithm that works when $A$ has full column rank. Here we give the first algorithms for the overcomplete setting ($m > n$). See also independent, concurrent work of Agarwal et al. Our algorithms work for $\mu$-incoherent dictionaries, which have long been a central object of study in sparse recovery. We require $X$ to be $k$-sparse for $k$ approximately $\sqrt{n}/C\mu \log n$. Notably, this bound is within a logarithmic factor of the information theoretic threshold for sparse recovery on incoherent dictionaries. Our work is based on new connections between dictionary learning and overlapping clustering in random graphs, and also new insights into when and why alternating minimization converges. Joint work with Sanjeev Arora, Rong Ge and Tengyu Ma ---------------------------------------------------------------------------------- Turnstile Streaming Algorithms Might as Well Be Linear Sketches David Woodruff, IBM Almaden In the turnstile model of data streams, an underlying vector x in {-m, -m+1,..., m-1, m} is presented as a long sequence of arbitrary positive and negative integer updates to its coordinates. A randomized algorithm seeks to approximate a function f(x) with constant probability while only making a single pass over this sequence of updates and using a small amount of space. All known algorithms in this model are linear sketches: they sample a matrix A from a distribution on integer matrices in the preprocessing phase, and maintain the linear sketch Ax while processing the stream. At the end of the stream, they output an arbitrary function of Ax. One cannot help but ask: are linear sketches universal? In this work we answer this question by showing that any 1-pass constant probability streaming algorithm for approximating an arbitrary function f of x in the turnstile model can also be implemented by sampling a matrix A from the uniform distribution on O(n log m) integer matrices, with entries of magnitude poly(n), and maintaining the linear sketch Ax. Furthermore, the logarithm of the number of possible states of Ax, as x ranges over {-m, -m+1, ..., m}^n, plus the amount of randomness needed to store A, is at most a logarithmic factor larger than the space required of the space-optimal algorithm. Our result shows that to prove space lower bounds for 1-pass streaming algorithms, it suffices to prove lower bounds in the simultaneous model of communication complexity, rather than the stronger 1-way model. Moreover, the fact that we can assume we have a linear sketch with polynomially-bounded entries further simplifies existing lower bounds, e.g., for frequency moments we present a simpler proof of the tilde{Omega}(n^{1-2/k}) bit complexity lower bound without using communication complexity. Joint work with Yi Li and Huy L. Nguyen ---------------------------------------------------------------------------------- Anti-concentration of Smoothed Functions and Talagrand's Convolution Conjecture James Lee, University of Washington I will present a proof of the Gaussian case of Talagrand's convolution conjecture which asserts that the noised version of an arbitrary non-negative function cannot be concentrated on any fixed scale. In other words, Markov's inequality is not tight for such functions. This follows from a more general logarithmic anti-concentration phenomenon for non-negative functions that are not too log-concave. The proof proceeds by analyzing a stochastic evolution of the Gaussian measure to our target measure. I will also discuss versions of this process on the discrete cube and the relationship to Fourier analysis and information theory. Joint work with Ronen Eldan.