Theory/O.R. Schedule - ALADDIN Center

Smooth Sensitivity and Sampling in Private Data Analysis

Adam Smith, Penn State University

Friday April 13th, 2007, 3:30pm
Wean Hall 7220

Abstract

The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains. The problem has received attention from several areas of computer science and statistics. I'll sketch, and briefly motivate, a definition of privacy developed recently in the cryptography and theory literature, and spend the bulk of the time looking at protocols for "output perturbation" which satisfy the definition. These protocols compute correct answers to a user's queries about the data set, but perturb the answers before releasing them by adding noise. Determining the right amount and type of noise to add, however, is delicate.

I'll discuss a new, generic framework for output perturbation. If a user asks to learn the value of a function F, our framework calibrates the noise magnitude to the *smooth sensitivity* of F on the database X --- a measure of variability of F in the neighborhood of the instance X. The new framework greatly expands the applicability of output perturbation. It also raises many interesting algorithmic questions. Namely, to apply the framework one must compute or approximate the smooth sensitivity of F on X. We show how to do this efficiently for several different functions. We also give a generic procedure based on sampling that allows one to release F(X) accurately on many databases X. This procedure is applicable even when no efficient algorithm for approximating smooth sensitivity of F is known or when F is given as a black box. We illustrate the procedure by applying it to k-S.E.D. (k-means) clustering and learning mixtures of Gaussians.

Based on joint work with Kobbi Nissim and Sofya Raskhodnikova, to appear at STOC 2007.