
Monday, May 24th 
8.30 am 
Continental Breakfast 
9.00 am 
Luis
von Ahn, Carnegie Mellon University
The ESP Game
(presentation ppt)
Computer programs cannot yet understand arbitrary images. We present a method that can assign meaningful and accurate word descriptions to any image. Attaching such descriptions to images on the Web would allow for more accurate image search engines, would improve the accessibility of sites (by providing descriptions of images to visually impaired individuals), and would help Web browsers block pornography. Our system is simple and will allow us to label all images on the Web extremely fast and inexpensively.
No knowledge of computer vision techniques (or of anything else) is required for this talk.

9.20 am 
Abie
Flaxman, Carnegie Mellon University
Two stage stochastic programming on average: Minimum Spanning Trees
(presentation pdf)
In reallife optimization, the costs and constraints are sometimes not known exactly. Stochastic programming is one approach used in these situations.
Designing minimum spanning trees is a classical problem in combinatorial optimization. A remarkable fact about average case minimum spanning trees is that when the cost of the edges between n nodes are selected independently and uniformly from [0,1], the cost of the MST converges to a constant, and the constant is \zeta(3) = 1/1^3 + 1/2^3 + 1/3^3 + ....
In the twostage stochastic programming version of this problem, known are the costs of each edge on Monday and the distribution of the costs of each edge on Tuesday. The goal is to select a set of edges to buy on Monday so that when the tree is completed on Tuesday, the expected total cost is minimized.
This talk will focus on the twostage version where the Monday and Tuesday costs are all selected independently and uniformly from [0,1]. In this case, a simple threshold heuristic produces a solution with expected cost \zeta(3)  1/2. This is not the optimal twostage solution, however.

9.40 am 
Ryan
Williams, Carnegie Mellon University
kAnonymity
(presentation pdf)
The technique of kanonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal kanonymization of relations are NPhard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal kanonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)approximation where the constant in the bigO is no more than 4. However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice.

10.00 am 
Mugizi
Robert Rwebangira, Carnegie Mellon University
SemiSupervised Learning Using Randomized Mincuts
(presentation ppt)
In many application domains there is a large amount of unlabeled data but only a very limited amount of labeled training data. One general approach that has been explored for utilizing this unlabeled data is to construct a graph on all the data points based on distance relationships among examples, and then to use the known labels to perform some type of graph partitioning. One natural partitioning to use is the minimum cut that agrees with the labeled data, which can be thought of as giving the most probable label assignment if one views labels as generated according to a Markov Random Field on the graph. Zhu et. al. propose a cut based on a relaxation of this field, and Joachims gives an algorithm based on finding an approximate minratio cut.
In this paper, we extend the mincut approach by adding randomness to the graph structure. The resulting algorithm addresses several shortcomings of the basic mincut approach, and can be given theoretical justification from both a Markov random field perspective and from sample complexity considerations. In cases where the graph does not have small cuts for a given classification problem, randomization may not help. However, our experiments on several datasets show that when the structure of the graph supports small cuts, this can result in highly accurate classifiers with good accuracy/coverage tradeoffs. In addition, we are able to achieve good performance with a very simple graphconstruction procedure.

10.20 am 
Break 
11.00 am 
Umut
Acar, Carnegie Mellon University
SelfAdjusting Computation
(presentation ppt)
I will present techniques for writing and analyzing programs that automatically adjust their output to changes in their input and consider several applications.

11.20 am 
Renato
Werneck, Princeton University
SelfAdjusting Top Trees
(presentation pdf)
Authors: Robert E. Tarjan and Renato F. Werneck
Consider a forest on n vertices that changes over time. Edge
insertions and removals can occur in any order, as long as no
cycle is ever created. In this setting, we are interested in
maintaining pathrelated or subtreerelated information. For example,
given the path between any two vertices in the same tree, we may
want to add a certain value to all edges in the path, or find
the one with minimum cost. This problem and closely related
ones often arise in algorithms for network optimization and
dynamic graphs. Top Trees, proposed by Alstrup et al., are the most
generic data structure to maintain dynamic trees, since they can
handle both path and subtreerelated queries on trees with arbitrary
degrees in O(log n) time. Although the data structure can be
implemented as an interface for Frederickson's Topology Trees,
this extra layer may hurt its performance in practice. In this talk,
we present a direct implementation that uses splaying and runs in
O(log n) amortized time per operation.

11.40 am 
Loukas
Georgiadis, Princeton University
Finding Dominators in Flowgraphs: A LinearTime Algorithm and
an Experimental Study (presentation ppt)
A flowgraph $G=(V,A,r)$ is a directed graph such that every vertex $v \in V$ is reachable from a distinguished start vertex $r \in V$. Vertex $w$ dominates vertex $v$ if every path from $r$ to $v$ includes $w$. Our goal is to compute for each vertex $v$ the set of vertices that dominate $v$. The computation of dominators has applications in program optimization, circuit testing, and other areas.
In this talk I will give a brief overview of a lineartime algorithm for finding dominators (this part is joint work with Robert Tarjan). Then I will present experimental results for various algorithms that solve this problem efficiently in practice (this part is joint work with Renato Werneck, Robert Tarjan, Spyros Triantafyllis and David August). 
12.00 noon 
Lunch 
1.30 pm 
Srinath
Sridhar, Carnegie Mellon University
Evaluation of the Haplotype Motif Model using the
Principle of Minimum Description (presentation ppt)
We examine haplotype structure in terms of
the recently introduced haplotype motif model, which relaxes some
of the assumptions of the popular haplotype block model in order
to more fully capture true patterns of haplotype conservation. Here
we present a formulation of the haplotype motif structure problem in terms
of the minimum description length (MDL) principle and introduce an expectation
maximization algorithm to infer the parameters of the model.
Comparisons of simulated data between block and motif models reveal
that motifs provide more parsimonious explanations of realistic data
sets, particularly as data set sizes increase. In our experiments
with large data sets, the motif model required 2530% fewer bits than
blocks. Our results suggest that relaxing the assumptions of the block model
will lead to better characterization of the nature of haplotype conservation
and thus, likely, to improve methods for disease association study, design
and analysis.

1.50 pm 
Kedar
Dhamdhere, Carnegie Mellon University
Approximating additive distortion of metric embeddings (presentation pdf)
We consider the problem of fitting input distance data to a line metric. The quality of the fit is the total additive error (also called additive distortion) in the data. We give an O(log n) approximation for the problem of minimizing the additive distortion. This problem was shown to be NPhard by Saxe. We also show how to extend this result to sum of squared errors (and other L_p norms of the error).

2.10 pm 
Taka
Osogami, Carnegie Mellon University
Recursive dimensionality reduction for the analysis of multiserver scheduling policies (presentation ppt)
We consider common scheduling policies for multiserver systems including cycle stealing policies, priority queuing, task assignment policies, and threshold policies. Even these simple policies are already very difficult to analyze because their underlying Markov chain structure grows infinitely in more than one dimension  the dimension of the Markov chain is typically equal to the number of servers. We introduce a new analysis technique which we call Recursive Dimensionality Reduction (RDR) which allows us to reduce an ndimensionally infinite Markov chain to a 1dimensionally infinite Markov chain which can then be solved via numerical methods. We apply RDR to analyze the above scheduling policies, leading to new insights about these policies and proposals for improved policies.

2.30 pm 
Peter
Richter, Carnegie Mellon University
Lower Bounds for Graph Embeddings and Combinatorial Preconditioners (presentation pdf)
Given a general graph $G$, a fundamental problem is to find a spanning tree or subgraph $H$ that best approximates $G$ w.r.t. some measure. Often this measure is some combination of the congestion and dilation of an embedding of $G$ into $H$. We consider the (generalized) condition number $\kappa_f(G,H) \leq O({\rm congestion} \cdot {\rm dilation})$, the square root of which bounds the number of steps needed to solve a linear system $G x = b$ preconditioned by $H$ iteratively. Upper bounds for approximating a general graph by a subgraph w.r.t. congestion, dilation, and condition number are known.
We look at lower bounds for this problem; in particular, what makes a particular graph $G$ hard to approximate well? For congestion, it is large separators; for dilation, it is long (isometric) cycles. We show that for the condition number, the existence of large separators or long cycles in $G$ is sufficient but not necessary for it to be hard to approximate well. The main result here is a nearly tight lower bound for $G$ equal to the square mesh. The proof uses elementary concepts from electrical circuit theory, linear algebra, and geometry.
Joint work with Gary Miller, to appear in SPAA 2004.

2.50 pm 
Break 
3.20 pm 
Todd
Phillips, Carnegie Mellon University
A BezierBased Approach to Unstructured Moving Meshes (presentation pdf)
I will present a framework being developed for maintaining the quality of two dimensional triangular moving meshes. The use of curved elements is the key idea that allows us to avoid excessive refinement and still obtain good quality meshes consisting of a low number of well shaped elements. We use Bsplines curves to model object boundaries, and objects are meshed with second order Bezier triangles. As the mesh evolves according to a nonuniform flow velocity field, we keep track of object boundaries and, if needed, carefully modify the mesh to keep it well shaped by applying a combination of vertex insertion and deletion, edge flipping, and edge smoothing operations at each time step. Our algorithms for these tasks are extensions of known algorithms for meshes built of straightsided elements and are designed for any fixedorder Bezier elements and Bsplines. Although in this work we have concentrated on quadratic elements, most of the operations are valid for elements of any order and they generalize well to higher dimensions. We present results of our scheme for a set of objects mimicking red blood cells subject to a precomputed flow velocity field.

3.40 pm 
David
Cardoze (Postdoc), Carnegie Mellon University
Topological Representations for Meshes (presentation pdf)
The ability to represent and manipulate topological structures in a computer is central to many areas such as computational geometry, computer graphics, solid modeling and scientific visualization. In this talk we describe some approaches to represent the topological structure of meshes, and discuss their advantages and limitations.

4.00 pm 
Konstantin
Andreev, Carnegie Mellon University
Balanced Graph Partitioning (presentation ppt)
In this paper we consider the problem of $(k,\nu)$balanced graph
partitioning  dividing the vertices of a graph into $k$ almost equal size
components (each of size less than $\nu \cdot \frac{n}{k}$) so that the
capacity of edges between different components is minimized.
This problem is a natural generalization of several other problems such as
minimum bisection, which is the $(2,1)$balanced partitioning problem.
We present a bicriteria polynomial time approximation algorithm with an
$O(\log^2{n})$approximation for any constant $\nu>1$. For $\nu =1$ we
show that no polytime approximation algorithm can guarantee a finite
approximation ratio unless $P=NP$. Previous work has only considered the
$(k,\nu)$balanced partitioning problem for $\nu \geq 2$.

4.20 pm 
Discussion
Ideas for Future PROBEs 
Lamps of ALADDIN Annual Project Review
