The Price of Privacy and the Limits of LP Decoding
Kunal Talwar
Microsoft Research
Friday, February 23rd, 2007, 3:30 pm
Wean Hall 7220
Abstract
Suppose one encodes an n-dimensional real vector x as y=Ax, for a suitably chosen A, and an adversary arbitrarily corrupts some of the entries of y to get y'. The surprising fact, proved by Donoho, is that by taking the entries of A as i.i.d. Gaussians, the vector x can be *exactly* recovered by minimizing the L_1 norm |y' - Ax'| over all x', provided only a tiny constant fraction of the entries of y were corrupted. Our principal result is the discovery of a sharp threshold rho* ~ 0.239, such that this L_1 minimization succeeds up to any error rate less than rho*, but fails for rates > rho*. This resolves an open question of Candes, Rudelson, Tao, and Vershynin.
Our interest in this problem arose while investigating the price, in accuracy, of protecting privacy in a statistical database. Our results say that any privacy mechanism, interactive or non-interactive, providing reasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private.
This is joint work with Cynthia Dwork and Frank McSherry.