see
the Privacy in Data Workshop's website

Josh Benaloh
Microsoft

Title: The
Current State of Cryptographic Election Protocols
Abstract:
For nearly twenty years, two broad cryptographic paradigms
for conducting verifiable secretballot elections have vied
for supremacy. Each has unique advantages and disadvantages,
and each has been improved over time by numerous authors
in numerous papers. This talk will describe the paradigms,
their history, and the winner of this long running competition.
Preliminary presentation: (ppt)
(html)
(pdf)



Latanya
Sweeney
CMU

Privacy in D.A.T.A
Preliminary presentation: (pdf) 


Johannes Gehrke
Cornell University 
Title: On Privacy Breaches
in PrivacyPreserving Data Mining
Abstract:
Consider the problem of building accurate data mining models
over personal data, while protecting privacy at the individual
level. In
particular, consider the class of solutions where each person
randomizes their data before sending it to the server for
building the
model. We will first present a framework for mining association
rules from transactions consisting of categorical items
where the data has been randomized to preserve privacy of
individual transactions. While it is feasible to recover
association rules and preserve privacy using a straightforward
``uniform'' randomization, the discovered rules can unfortunately
be exploited to find privacy breaches. We analyze the nature
of privacy breaches and propose a class of randomization
operators that are much more effective than uniform randomization
in limiting the breaches. We then talk about a general randomizationbased
approach for limiting privacy breaches that we call "amplification",
and we define new information measures that take privacy
breaches into account when quantifying the amount of privacy
preserved by a specific randomization technique.
This is joint work with Rakesh
Agrawal, Alexandre Evfimievski, and Ramakrishnan Srikant.
Preliminary presentation: (pdf)



Benny Pinkas
Hewlett Packard 
Title: Privacy preserving
learning of decision trees
Abstract:
We consider the scenario of two parties having private databases
that wish to cooperate by computing a data mining algorithm
on the union of their databases. The privacy requirement
means that any information learned by a party during the
computation could be learned from this party's input and
output.
Cryptographic research has shown that for any function,
the above problem has a polynomialtime protocol. However,
this solution is based on generic constructions that involve
constructing a circuit computing the algorithm on the inputs.
In this case, where the input consists of huge databases,
this type of a solution is clearly impractical.
We show that efficient solutions are possible for large
databases. This is demonstrated for the decisiontree learning
algorithm ID3. Our techniques are based doing most of the
computation locally, and reducing the problem to a secure
computation of the logarithm function. The resulting complexity
is only logarithmic in the number of items in the database
(multiplied by some other parameters), and the number of
communication rounds is equal to the decision tree depth.
Joint work with Yehuda
Lindell.
Preliminary presentation: (pdf)



Rebecca Wright
Stevens Institute of Technology 
Title: Privacyprotecting
statistic computation: theory and practice
Abstract:
Suppose a client wishes to compute some aggregate statistics
on a privatelyowned data base. The data owner wants to
protect the privacy of the personal information in the data
base, while the client does not want to reveal his selection
criteria. Privacyprotecting statistical analysis allows
the client and data owner to interact in such a way that
the client learns the desired aggregate statistics, but
does not learn anything further about the data; the data
owner
leans nothing about the client's query. Motivated by this
application, we consider the more general problem of "selective
private function evaluation," in which a client can
privately compute an arbitrary function over a database.
I will present various approaches for constructing efficient
selective private function evaluation protocols, both for
the general problem and for privacyprotecting statistical
analysis. I will also discuss our prototype implementation
of some of these protocols and our initial experimental
results.



Kobbi Nissim
NEC Labs, Princeton 
Title: Revealing
information while preserving privacy
Abstract:
Envision a database of a hospital containing the medical
history of some large population. On one hand, the hospital
is interested in keeping the privacy of its patients, and
leaking no medical information that could be related to
a specific patient. On the other hand, the hospital would
like to advance scientific research which is based (among
other things) on statistics of the information in the database.
We propose a mathematical model for this problem, and present
(tight) lower bounds limiting the amount of information
that can be revealed without violation of privacy.
Based on joint work with Irit
Dinur and Cynthia
Dwork
Preliminary presentation: (ppt)
(html)
(pdf)



Poorvi Vora
HewlettPackard Co. 
Title: The
channel coding theorem and the security of binary randomization
Abstract:
We treat the binary symmetric randomization protocol (flipping
a bit with probability p) as a channel for the information
to be protected, and a tracker as wanting efficient communication
over the channel. Errorcorrecting codes correspond to attacks,
and Shannon's theorems provide bounds on attack efficiency.
Our main result is a tight lower bound on the query complexity,
per data point, of any attack making legitimate use of the
protocol to reduce error to arbitrarily low values. This
bound is independent of error.
Preliminary presentation: (pps)
(html)
(pdf)



Ryan Williams 
Title: Optimal kAnonymity
using Generalization and Suppression is NPHard
Abstract:
Kanonymization using generalization and suppression is
a method for protecting the privacy of individuals while
allowing data sharing. An optimum kanonymization is one
which obtains the required privacy protection (insuring
that every individual is a member of a group of k "indistinguishable"
individuals) while minimizing the amount of data loss resulting
from the privacy protection. We prove that finding an optimum
kanonymization is NPhard in general.
Preliminary presentation: (pdf)



Michael Shamos (CMU) 
Title: Mathematics and the
Privacy Laws
Abstract:
Realworld privacy problems are motivated by laws, committed
privacy policies and social expectations. Statutes are not
expressed mathematically and therefore can impose inconsistent
requirements or mandate processes that do not fulfill their
stated objectives. In this talk we look at some recent privacy
laws and discuss the possibility (or not) of implementing
automated systems to administer them. This requires us to
look carefully at functions and databases to determine what
can be learned about them from queries and how metadata
about a function (e.g. knowing whether it has a fixed point)
can compromise the function.
Preliminary presentation: (ppt)
(html) (pdf)



Joe Kilian
NEC Labs, Princeton 
Title: Secure Computation
(A Survey)
Abstract:
This tutorial, geared to nonspecialists, gives an elementary
overview of secure computation. What is it? What is it good
for? When is it possible? When is it practical? Covering
two decades of research by a large community of researchers,
some details may be omitted.
Preliminary presentation: (ppt)
(html) (pdf)



Cynthia
Dwork 
Title: A CryptographyFlavored
Approach to Privacy in Public Databases
Preliminary presentation: (pdf)



Bradley
Malin 
Title:
Trail ReIdentification: Learning Who You Are From Where
You Have Been
Abstract:
This talk provides algorithms for learning the identities
of individuals from the trails of seemingly anonymous information
they leave behind.
Consider online consumers, who have the IP addresses of
their computers logged at each website visited. Many falsely
believe they cannot be identified. The term "reidentification"
refers to correctly relating seemingly anonymous data to
explicitly identifying information (such as the name or
address) of the person who is the subject of those data.
Reidentification has historically been associated with
data released from a single data holder. This paper extends
the concept to "trail reidentification" in which
a person is related to a trail of seemingly anonymous and
homogenous data left across different locations. The 3 novel
algorithms presented in this talk perform trail reidentifications
by exploiting the fact that some locations also capture
explicitly identifying information and subsequently provide
the unidentified data and the identified data as separate
data releases. Intersecting occurrences in these two kinds
of data can reveal identities. The algorithms presented
reidentify individuals based on the uniqueness of trails
across unidentified and identified datasets. The algorithms
differ in the amount of completeness and multiplicity assumed
in the data. Successful
reidentifications are reported for DNA sequences left by
hospital patients and for IP addresses left by online consumers.
These algorithms are extensible to tracking collocations
of people, which is an objective of homeland defense surveillance.



Ran
Canetti
IBM
Research 
Title: Jointly Restraining
the Big Brother: Using Cryptography to Reconcile Privacy
with Data Aggragation
Abstract:
Largescale aggregation of data regarding individuals may
be helpful in many cases. Examples include using medical
records for research purposes, using customer behavior to
improve and customize products and services, using electronice
behavior patterns to help law enforcement. However, such
large scale aggregation of data can easily compromise the
privacy of individuals and thus undermine the freedom of
our society. Consequently, one must provide mechanisms for
controling the aggragation activity, and for preventing
abuse and leakage of the aggragated data. Such tasks are
squarely within the domain of cryptographic protocol design.
This talk surveys the applicability of cryptographic techniques
for general protocol design to the above problems. We start
with the universal feasibility results from the 1980's and
review their strong points and weaknesses. We then concentrate
on more recent general techniques. These techniques provide
better composability, efficiency, and resilience to newtork
delays, and may thus be more suitable as a basis for actual
solutions. Finally, we discuss shortcomings of the above
techniques and some directions for further research. Part
of the talk is based on joint work with Lindell, Ostrovsky,
and Sahai.
Preliminary presentation: (ppt)
(html) (pdf)



Ralph
Gross 
Title: Preserving Privacy
by Deidentifying Facial Images
Abstract:
In the context of sharing video surveillance data, a significant
threat to privacy is face recognition software, which can
automatically identify known people from a driver's license
photo database, for example, and thereby track people regardless
of suspicion. This talk introduces an algorithm to protect
the privacy of individuals in video surveillance data by
deidentifying faces such that many facial characteristics
remain but the face cannot be reliably recognized. A trivial
solution to deidentifying faces involves blacking out each
face. This thwarts any possible face recognition, but because
all facial details are obscured, the result is of limited
use. Many ad hoc attempts, such as covering eyes or randomly
perturbing image pixels, fail to thwart face recognition
because of the robustness of face recognition methods. This
talk presents a new privacyenabling algorithm, named kSame,
that scientifically limits the ability of face recognition
software to reliably recognize faces while maintaining facial
details in the images. The algorithm determines similarity
between faces based on a distance metric and creates new
faces by averaging image components, which may be the original
image pixels (kSamePixel) or eigenvectors (kSameEigen).
Results are presented on a standard collection of real face
images with varying k.



Samuel EdohoEket 
Title: Answering
"How Many?" Over a Distributed, Privacypreserving
Surveillance Network
Abstract:
Collecting sensitive data from disparate data holders is
at the center of bioterrorism surveillance efforts in the
United States. Current approaches require data holdes to
share data with a centralized authority in order for the
central authority to answer basic surveillance questions.
This talk examines an alternative in which answers to basic
surveillance questions are computed over a network of data
holder machines. The final result is provided to the central
authority and not the data elements themselves. These computations
are achieved while protecting the confidentiality of each
data holder's information and the privacy of the individuals
whose information is the subject of the computation. For
example, a simple surveillance question is the "how
many" question such as, "how many prescriptions
of drug x were sold today?" or "how many students
were absent from school today?" Requesting the answer
from schools and stores is often met with intense opposition
because of confidentiality concerns for the data holder's
enterprise. In this talk, I demonstrate a way to answer
these kinds of questions over a distributed network, preserving
the confidentiality of the data holders.



Stephen E. Fienberg 
Title: Preserving Confidentiality
AND Providing Adequate Data for Statistical Modeling: The
Role of Partial and Perturbed Data
Abstract:
This presentation emphasizes the tradeoffs between disclosure
limitation and utility and strategies for assessing both.
I will illustrate with our work on tables of counts.
Preliminary presentation: (ppt)
(html) (pdf)


