Workshops, Aladdin Center

see the Privacy in Data Workshop's website
Josh Benaloh Microsoft	Title: The Current State of Cryptographic Election Protocols Abstract: For nearly twenty years, two broad cryptographic paradigms for conducting verifiable secret-ballot elections have vied for supremacy. Each has unique advantages and disadvantages, and each has been improved over time by numerous authors in numerous papers. This talk will describe the paradigms, their history, and the winner of this long running competition. Preliminary presentation: (ppt) (html) (pdf)

Latanya Sweeney CMU	Privacy in D.A.T.A Preliminary presentation: (pdf)

Johannes Gehrke Cornell University	Title: On Privacy Breaches in Privacy-Preserving Data Mining Abstract: Consider the problem of building accurate data mining models over personal data, while protecting privacy at the individual level. In particular, consider the class of solutions where each person randomizes their data before sending it to the server for building the model. We will first present a framework for mining association rules from transactions consisting of categorical items where the data has been randomized to preserve privacy of individual transactions. While it is feasible to recover association rules and preserve privacy using a straightforward ``uniform'' randomization, the discovered rules can unfortunately be exploited to find privacy breaches. We analyze the nature of privacy breaches and propose a class of randomization operators that are much more effective than uniform randomization in limiting the breaches. We then talk about a general randomization-based approach for limiting privacy breaches that we call "amplification", and we define new information measures that take privacy breaches into account when quantifying the amount of privacy preserved by a specific randomization technique. This is joint work with Rakesh Agrawal, Alexandre Evfimievski, and Ramakrishnan Srikant. Preliminary presentation: (pdf)

Benny Pinkas Hewlett Packard	Title: Privacy preserving learning of decision trees Abstract: We consider the scenario of two parties having private databases that wish to cooperate by computing a data mining algorithm on the union of their databases. The privacy requirement means that any information learned by a party during the computation could be learned from this party's input and output. Cryptographic research has shown that for any function, the above problem has a polynomial-time protocol. However, this solution is based on generic constructions that involve constructing a circuit computing the algorithm on the inputs. In this case, where the input consists of huge databases, this type of a solution is clearly impractical. We show that efficient solutions are possible for large databases. This is demonstrated for the decision-tree learning algorithm ID3. Our techniques are based doing most of the computation locally, and reducing the problem to a secure computation of the logarithm function. The resulting complexity is only logarithmic in the number of items in the database (multiplied by some other parameters), and the number of communication rounds is equal to the decision tree depth. Joint work with Yehuda Lindell. Preliminary presentation: (pdf)

Rebecca Wright Stevens Institute of Technology	Title: Privacy-protecting statistic computation: theory and practice Abstract: Suppose a client wishes to compute some aggregate statistics on a privately-owned data base. The data owner wants to protect the privacy of the personal information in the data base, while the client does not want to reveal his selection criteria. Privacy-protecting statistical analysis allows the client and data owner to interact in such a way that the client learns the desired aggregate statistics, but does not learn anything further about the data; the data owner leans nothing about the client's query. Motivated by this application, we consider the more general problem of "selective private function evaluation," in which a client can privately compute an arbitrary function over a database. I will present various approaches for constructing efficient selective private function evaluation protocols, both for the general problem and for privacy-protecting statistical analysis. I will also discuss our prototype implementation of some of these protocols and our initial experimental results.

Kobbi Nissim NEC Labs, Princeton	Title: Revealing information while preserving privacy Abstract: Envision a database of a hospital containing the medical history of some large population. On one hand, the hospital is interested in keeping the privacy of its patients, and leaking no medical information that could be related to a specific patient. On the other hand, the hospital would like to advance scientific research which is based (among other things) on statistics of the information in the database. We propose a mathematical model for this problem, and present (tight) lower bounds limiting the amount of information that can be revealed without violation of privacy. Based on joint work with Irit Dinur and Cynthia Dwork Preliminary presentation: (ppt) (html) (pdf)

Poorvi Vora Hewlett-Packard Co.	Title: The channel coding theorem and the security of binary randomization Abstract: We treat the binary symmetric randomization protocol (flipping a bit with probability p) as a channel for the information to be protected, and a tracker as wanting efficient communication over the channel. Error-correcting codes correspond to attacks, and Shannon's theorems provide bounds on attack efficiency. Our main result is a tight lower bound on the query complexity, per data point, of any attack making legitimate use of the protocol to reduce error to arbitrarily low values. This bound is independent of error. Preliminary presentation: (pps) (html) (pdf)

Ryan Williams	Title: Optimal k-Anonymity using Generalization and Suppression is NP-Hard Abstract: K-anonymization using generalization and suppression is a method for protecting the privacy of individuals while allowing data sharing. An optimum k-anonymization is one which obtains the required privacy protection (insuring that every individual is a member of a group of k "indistinguishable" individuals) while minimizing the amount of data loss resulting from the privacy protection. We prove that finding an optimum k-anonymization is NP-hard in general. Preliminary presentation: (pdf)

Michael Shamos (CMU)	Title: Mathematics and the Privacy Laws Abstract: Real-world privacy problems are motivated by laws, committed privacy policies and social expectations. Statutes are not expressed mathematically and therefore can impose inconsistent requirements or mandate processes that do not fulfill their stated objectives. In this talk we look at some recent privacy laws and discuss the possibility (or not) of implementing automated systems to administer them. This requires us to look carefully at functions and databases to determine what can be learned about them from queries and how metadata about a function (e.g. knowing whether it has a fixed point) can compromise the function. Preliminary presentation: (ppt) (html) (pdf)

Joe Kilian NEC Labs, Princeton	Title: Secure Computation (A Survey) Abstract: This tutorial, geared to nonspecialists, gives an elementary overview of secure computation. What is it? What is it good for? When is it possible? When is it practical? Covering two decades of research by a large community of researchers, some details may be omitted. Preliminary presentation: (ppt) (html) (pdf)

Cynthia Dwork	Title: A Cryptography-Flavored Approach to Privacy in Public Databases Preliminary presentation: (pdf)

Bradley Malin	Title: Trail Re-Identification: Learning Who You Are From Where You Have Been Abstract: This talk provides algorithms for learning the identities of individuals from the trails of seemingly anonymous information they leave behind. Consider online consumers, who have the IP addresses of their computers logged at each website visited. Many falsely believe they cannot be identified. The term "re-identification" refers to correctly relating seemingly anonymous data to explicitly identifying information (such as the name or address) of the person who is the subject of those data. Re-identification has historically been associated with data released from a single data holder. This paper extends the concept to "trail re-identification" in which a person is related to a trail of seemingly anonymous and homogenous data left across different locations. The 3 novel algorithms presented in this talk perform trail re-identifications by exploiting the fact that some locations also capture explicitly identifying information and subsequently provide the unidentified data and the identified data as separate data releases. Intersecting occurrences in these two kinds of data can reveal identities. The algorithms presented re-identify individuals based on the uniqueness of trails across unidentified and identified datasets. The algorithms differ in the amount of completeness and multiplicity assumed in the data. Successful re-identifications are reported for DNA sequences left by hospital patients and for IP addresses left by online consumers. These algorithms are extensible to tracking collocations of people, which is an objective of homeland defense surveillance.

Ran Canetti IBM Research	Title: Jointly Restraining the Big Brother: Using Cryptography to Reconcile Privacy with Data Aggragation Abstract: Large-scale aggregation of data regarding individuals may be helpful in many cases. Examples include using medical records for research purposes, using customer behavior to improve and customize products and services, using electronice behavior patterns to help law enforcement. However, such large scale aggregation of data can easily compromise the privacy of individuals and thus undermine the freedom of our society. Consequently, one must provide mechanisms for controling the aggragation activity, and for preventing abuse and leakage of the aggragated data. Such tasks are squarely within the domain of cryptographic protocol design. This talk surveys the applicability of cryptographic techniques for general protocol design to the above problems. We start with the universal feasibility results from the 1980's and review their strong points and weaknesses. We then concentrate on more recent general techniques. These techniques provide better composability, efficiency, and resilience to newtork delays, and may thus be more suitable as a basis for actual solutions. Finally, we discuss shortcomings of the above techniques and some directions for further research. Part of the talk is based on joint work with Lindell, Ostrovsky, and Sahai. Preliminary presentation: (ppt) (html) (pdf)

Ralph Gross	Title: Preserving Privacy by De-identifying Facial Images Abstract: In the context of sharing video surveillance data, a significant threat to privacy is face recognition software, which can automatically identify known people from a driver's license photo database, for example, and thereby track people regardless of suspicion. This talk introduces an algorithm to protect the privacy of individuals in video surveillance data by de-identifying faces such that many facial characteristics remain but the face cannot be reliably recognized. A trivial solution to de-identifying faces involves blacking out each face. This thwarts any possible face recognition, but because all facial details are obscured, the result is of limited use. Many ad hoc attempts, such as covering eyes or randomly perturbing image pixels, fail to thwart face recognition because of the robustness of face recognition methods. This talk presents a new privacy-enabling algorithm, named k-Same, that scientifically limits the ability of face recognition software to reliably recognize faces while maintaining facial details in the images. The algorithm determines similarity between faces based on a distance metric and creates new faces by averaging image components, which may be the original image pixels (k-Same-Pixel) or eigenvectors (k-Same-Eigen). Results are presented on a standard collection of real face images with varying k.

Samuel Edoho-Eket	Title: Answering "How Many?" Over a Distributed, Privacy-preserving Surveillance Network Abstract: Collecting sensitive data from disparate data holders is at the center of bio-terrorism surveillance efforts in the United States. Current approaches require data holdes to share data with a centralized authority in order for the central authority to answer basic surveillance questions. This talk examines an alternative in which answers to basic surveillance questions are computed over a network of data holder machines. The final result is provided to the central authority and not the data elements themselves. These computations are achieved while protecting the confidentiality of each data holder's information and the privacy of the individuals whose information is the subject of the computation. For example, a simple surveillance question is the "how many" question such as, "how many prescriptions of drug x were sold today?" or "how many students were absent from school today?" Requesting the answer from schools and stores is often met with intense opposition because of confidentiality concerns for the data holder's enterprise. In this talk, I demonstrate a way to answer these kinds of questions over a distributed network, preserving the confidentiality of the data holders.

Stephen E. Fienberg	Title: Preserving Confidentiality AND Providing Adequate Data for Statistical Modeling: The Role of Partial and Perturbed Data Abstract: This presentation emphasizes the tradeoffs between disclosure limitation and utility and strategies for assessing both. I will illustrate with our work on tables of counts. Preliminary presentation: (ppt) (html) (pdf)

see the Privacy in Data Workshop's website