CENTER Carnegie Mellon School of Computer Science
Workshops
Abstracts Lamps of ALADDIN Annual Project Review

 Monday, May 24th 8.30 am Continental Breakfast 9.00 am Luis von Ahn, Carnegie Mellon University The ESP Game (presentation ppt) Computer programs cannot yet understand arbitrary images. We present a method that can assign meaningful and accurate word descriptions to any image. Attaching such descriptions to images on the Web would allow for more accurate image search engines, would improve the accessibility of sites (by providing descriptions of images to visually impaired individuals), and would help Web browsers block pornography. Our system is simple and will allow us to label all images on the Web extremely fast and inexpensively. No knowledge of computer vision techniques (or of anything else) is required for this talk. 9.20 am Abie Flaxman, Carnegie Mellon University Two stage stochastic programming on average: Minimum Spanning Trees (presentation pdf) In real-life optimization, the costs and constraints are sometimes not known exactly. Stochastic programming is one approach used in these situations. Designing minimum spanning trees is a classical problem in combinatorial optimization. A remarkable fact about average case minimum spanning trees is that when the cost of the edges between n nodes are selected independently and uniformly from [0,1], the cost of the MST converges to a constant, and the constant is \zeta(3) = 1/1^3 + 1/2^3 + 1/3^3 + .... In the two-stage stochastic programming version of this problem, known are the costs of each edge on Monday and the distribution of the costs of each edge on Tuesday. The goal is to select a set of edges to buy on Monday so that when the tree is completed on Tuesday, the expected total cost is minimized. This talk will focus on the two-stage version where the Monday and Tuesday costs are all selected independently and uniformly from [0,1]. In this case, a simple threshold heuristic produces a solution with expected cost \zeta(3) - 1/2. This is not the optimal two-stage solution, however. 9.40 am Ryan Williams, Carnegie Mellon University k-Anonymity (presentation pdf) The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are NP-hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4. However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice. 10.00 am Mugizi Robert Rwebangira, Carnegie Mellon University Semi-Supervised Learning Using Randomized Mincuts (presentation ppt) In many application domains there is a large amount of unlabeled data but only a very limited amount of labeled training data. One general approach that has been explored for utilizing this unlabeled data is to construct a graph on all the data points based on distance relationships among examples, and then to use the known labels to perform some type of graph partitioning. One natural partitioning to use is the minimum cut that agrees with the labeled data, which can be thought of as giving the most probable label assignment if one views labels as generated according to a Markov Random Field on the graph. Zhu et. al. propose a cut based on a relaxation of this field, and Joachims gives an algorithm based on finding an approximate min-ratio cut. In this paper, we extend the mincut approach by adding randomness to the graph structure. The resulting algorithm addresses several shortcomings of the basic mincut approach, and can be given theoretical justification from both a Markov random field perspective and from sample complexity considerations. In cases where the graph does not have small cuts for a given classification problem, randomization may not help. However, our experiments on several datasets show that when the structure of the graph supports small cuts, this can result in highly accurate classifiers with good accuracy/coverage tradeoffs. In addition, we are able to achieve good performance with a very simple graph-construction procedure. 10.20 am Break 11.00 am Umut Acar, Carnegie Mellon University Self-Adjusting Computation (presentation ppt) I will present techniques for writing and analyzing programs that automatically adjust their output to changes in their input and consider several applications. 11.20 am Renato Werneck, Princeton University Self-Adjusting Top Trees (presentation pdf) Authors: Robert E. Tarjan and Renato F. Werneck Consider a forest on n vertices that changes over time. Edge insertions and removals can occur in any order, as long as no cycle is ever created. In this setting, we are interested in maintaining path-related or subtree-related information. For example, given the path between any two vertices in the same tree, we may want to add a certain value to all edges in the path, or find the one with minimum cost. This problem and closely related ones often arise in algorithms for network optimization and dynamic graphs. Top Trees, proposed by Alstrup et al., are the most generic data structure to maintain dynamic trees, since they can handle both path- and subtree-related queries on trees with arbitrary degrees in O(log n) time. Although the data structure can be implemented as an interface for Frederickson's Topology Trees, this extra layer may hurt its performance in practice. In this talk, we present a direct implementation that uses splaying and runs in O(log n) amortized time per operation. 11.40 am Loukas Georgiadis, Princeton University Finding Dominators in Flowgraphs: A Linear-Time Algorithm and an Experimental Study (presentation ppt) A flowgraph $G=(V,A,r)$ is a directed graph such that every vertex $v \in V$ is reachable from a distinguished start vertex $r \in V$. Vertex $w$ dominates vertex $v$ if every path from $r$ to $v$ includes $w$. Our goal is to compute for each vertex $v$ the set of vertices that dominate $v$. The computation of dominators has applications in program optimization, circuit testing, and other areas. In this talk I will give a brief overview of a linear-time algorithm for finding dominators (this part is joint work with Robert Tarjan). Then I will present experimental results for various algorithms that solve this problem efficiently in practice (this part is joint work with Renato Werneck, Robert Tarjan, Spyros Triantafyllis and David August). 12.00 noon Lunch 1.30 pm Srinath Sridhar, Carnegie Mellon University Evaluation of the Haplotype Motif Model using the Principle of Minimum Description (presentation ppt) We examine haplotype structure in terms of the recently introduced haplotype motif model, which relaxes some of the assumptions of the popular haplotype block model in order to more fully capture true patterns of haplotype conservation. Here we present a formulation of the haplotype motif structure problem in terms of the minimum description length (MDL) principle and introduce an expectation maximization algorithm to infer the parameters of the model. Comparisons of simulated data between block and motif models reveal that motifs provide more parsimonious explanations of realistic data sets, particularly as data set sizes increase. In our experiments with large data sets, the motif model required 25-30% fewer bits than blocks. Our results suggest that relaxing the assumptions of the block model will lead to better characterization of the nature of haplotype conservation and thus, likely, to improve methods for disease association study, design and analysis. 1.50 pm Kedar Dhamdhere, Carnegie Mellon University Approximating additive distortion of metric embeddings (presentation pdf) We consider the problem of fitting input distance data to a line metric. The quality of the fit is the total additive error (also called additive distortion) in the data. We give an O(log n) approximation for the problem of minimizing the additive distortion. This problem was shown to be NP-hard by Saxe. We also show how to extend this result to sum of squared errors (and other L_p norms of the error). 2.10 pm Taka Osogami, Carnegie Mellon University Recursive dimensionality reduction for the analysis of multiserver scheduling policies (presentation ppt) We consider common scheduling policies for multiserver systems including cycle stealing policies, priority queuing, task assignment policies, and threshold policies. Even these simple policies are already very difficult to analyze because their underlying Markov chain structure grows infinitely in more than one dimension -- the dimension of the Markov chain is typically equal to the number of servers. We introduce a new analysis technique which we call Recursive Dimensionality Reduction (RDR) which allows us to reduce an n-dimensionally infinite Markov chain to a 1-dimensionally infinite Markov chain which can then be solved via numerical methods. We apply RDR to analyze the above scheduling policies, leading to new insights about these policies and proposals for improved policies. 2.30 pm Peter Richter, Carnegie Mellon University Lower Bounds for Graph Embeddings and Combinatorial Preconditioners (presentation pdf) Given a general graph $G$, a fundamental problem is to find a spanning tree or subgraph $H$ that best approximates $G$ w.r.t. some measure. Often this measure is some combination of the congestion and dilation of an embedding of $G$ into $H$. We consider the (generalized) condition number $\kappa_f(G,H) \leq O({\rm congestion} \cdot {\rm dilation})$, the square root of which bounds the number of steps needed to solve a linear system $G x = b$ preconditioned by $H$ iteratively. Upper bounds for approximating a general graph by a subgraph w.r.t. congestion, dilation, and condition number are known. We look at lower bounds for this problem; in particular, what makes a particular graph $G$ hard to approximate well? For congestion, it is large separators; for dilation, it is long (isometric) cycles. We show that for the condition number, the existence of large separators or long cycles in $G$ is sufficient but not necessary for it to be hard to approximate well. The main result here is a nearly tight lower bound for $G$ equal to the square mesh. The proof uses elementary concepts from electrical circuit theory, linear algebra, and geometry. Joint work with Gary Miller, to appear in SPAA 2004. 2.50 pm Break 3.20 pm Todd Phillips, Carnegie Mellon University A Bezier-Based Approach to Unstructured Moving Meshes (presentation pdf) I will present a framework being developed for maintaining the quality of two dimensional triangular moving meshes. The use of curved elements is the key idea that allows us to avoid excessive refinement and still obtain good quality meshes consisting of a low number of well shaped elements. We use B-splines curves to model object boundaries, and objects are meshed with second order Bezier triangles. As the mesh evolves according to a non-uniform flow velocity field, we keep track of object boundaries and, if needed, carefully modify the mesh to keep it well shaped by applying a combination of vertex insertion and deletion, edge flipping, and edge smoothing operations at each time step. Our algorithms for these tasks are extensions of known algorithms for meshes built of straight-sided elements and are designed for any fixed-order Bezier elements and B-splines. Although in this work we have concentrated on quadratic elements, most of the operations are valid for elements of any order and they generalize well to higher dimensions. We present results of our scheme for a set of objects mimicking red blood cells subject to a precomputed flow velocity field. 3.40 pm David Cardoze (Postdoc), Carnegie Mellon University Topological Representations for Meshes (presentation pdf) The ability to represent and manipulate topological structures in a computer is central to many areas such as computational geometry, computer graphics, solid modeling and scientific visualization. In this talk we describe some approaches to represent the topological structure of meshes, and discuss their advantages and limitations. 4.00 pm Konstantin Andreev, Carnegie Mellon University Balanced Graph Partitioning (presentation ppt) In this paper we consider the problem of $(k,\nu)$-balanced graph partitioning - dividing the vertices of a graph into $k$ almost equal size components (each of size less than $\nu \cdot \frac{n}{k}$) so that the capacity of edges between different components is minimized. This problem is a natural generalization of several other problems such as minimum bisection, which is the $(2,1)$-balanced partitioning problem. We present a bicriteria polynomial time approximation algorithm with an $O(\log^2{n})$-approximation for any constant $\nu>1$. For $\nu =1$ we show that no polytime approximation algorithm can guarantee a finite approximation ratio unless $P=NP$. Previous work has only considered the $(k,\nu)$-balanced partitioning problem for $\nu \geq 2$. 4.20 pm Discussion Ideas for Future PROBEs

This material is based upon work supported by National Science Foundation under Grant No. 0122581.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the
National Science Foundation