Smooth Sensitivity and Sampling
in Private Data Analysis
Adam Smith, Penn State University
Friday April 13th, 2007, 3:30pm
Wean Hall 7220
Abstract
The goal of private data analysis is to release aggregate information
about a data set while protecting the privacy of the individuals
whose information the data set contains. The problem has received
attention from several areas of computer science and statistics.
I'll sketch, and briefly motivate, a definition of privacy developed
recently in the cryptography and theory literature, and spend
the bulk of the time looking at protocols for "output perturbation"
which satisfy the definition. These protocols compute correct
answers to a user's queries about the data set, but perturb the
answers before releasing them by adding noise. Determining the
right amount and type of noise to add, however, is delicate.
I'll discuss a new, generic framework for output perturbation.
If a user asks to learn the value of a function F, our framework
calibrates the noise magnitude to the *smooth sensitivity* of
F on the database X --- a measure of variability of F in the neighborhood
of the instance X. The new framework greatly expands the applicability
of output perturbation. It also raises many interesting algorithmic
questions. Namely, to apply the framework one must compute or
approximate the smooth sensitivity of F on X. We show how to do
this efficiently for several different functions. We also give
a generic procedure based on sampling that allows one to release
F(X) accurately on many databases X. This procedure is applicable
even when no efficient algorithm for approximating smooth sensitivity
of F is known or when F is given as a black box. We illustrate
the procedure by applying it to k-S.E.D. (k-means) clustering
and learning mixtures of Gaussians.
Based on joint work with Kobbi Nissim and Sofya
Raskhodnikova, to appear at STOC 2007.