CENTER Carnegie Mellon UniversityCarnegie Mellon Computer Science DepartmentSchool of Computer Science
Abstracts Web Structure and Algorithms Workshop
Related Activities
Outreach Roadshow


Friday April 9
10.00A.M. Coffee
10.20A.M. Guy Blelloch, Carnegie Mellon
Welcome and Introductions

Michael Kearns, University of Pennsylvania
The Economics of Social Networks

In an attempt to explain a number of apparently universal statistical properties of natural social and economic networks, many authors have proposed rich families of probabilistic generative models for social network formation. We have recently introduced a graph-theoretic representation for classical mathematical economic models. This work studies the marriage of these two lines of inquiry: we examine the network effects that social network models (such as preferential attachment) introduce into classical economic settings. Of particular interest are the relationships between the statistical structure of the network, and economic metrics such as wealth distribution and price variation. Our findings are a mixture of theoretical results and large-scale simulations that have been made possible only by algorithmic advances of the last two years.

Joint work with Sham Kakade, Luis Ortiz, Robin Pemantle, and Sid Suri.


Milena Mihail, Georgia Tech
Conductance of Power Law and Scale Free Graphs

Abstract: We generalize the classical result of expansion of random regular graphs for graphs whose degree sequences follow heavy tailed statistics. Such graphs arise in complex networks like the Internet, the WWW and biological networks. In particular, we establish constant conductance in the configurational random graph model known as power law random graph, and in the evolutionary model of growth with preferential attachment, under the assumption that the minimum degree is at least three. As an immediate consequence, we get separation of 1st and 2nd eigenvalues of the stochastic normalization of the adjacency matrix of the graph, with many further algorithmic implications, as well as many open problems, in the corresponding application areas.

12.30P.M. Lunch

Lincoln Lu, UC San Diego
Spectra of random power law graphs (pdf1), (pdf2), (pdf3)

Abstract: In the study of the spectra of power law graphs, there are basically two competing approaches. One is to prove analogues of Wigner's semi-circle law while the other predicts that the eigenvalues follow a power law distributions. Although the semi-circle law and the power law have nothing in common, we will show that both approaches are essentially correct if one considers the appropriate matrices. We will prove that (under certain mild conditions) the eigenvalues of the (normalized) Laplacian of a random power law graph follow the semi-circle law while the spectrum of the adjacency matrix of a power law graph obeys the power law. Our results are based on the analysis of random graphs with given expected degrees and their relations to several key invariants. Of interest are a number of (new) values for the exponent $\beta$ where phase transitions for eigenvalue distributions occur. The spectrum distributions have direct implications to numerous graph algorithms such as randomized algorithms that involve rapidly mixing Markov chains, for example.

This is a joint work with Lincoln Lu, Fan Chung, and Van Vu.


Sanjeev Arora, Princeton University
Expander flows and a sqrt{log n}-approximation for graph expansion/sparsest cut

Partitioning a graph into two (or more) large pieces while minimizing the size of the ``interface'' between them is a fundamental combinatorial problem. Graph partitions or separators are central objects of study in the theory of Markov chains, geometric embeddings and are a natural algorithmic primitive in numerous settings, including clustering, divide and conquer approaches, PRAM emulation, VLSI layout, and packet routing in distributed networks.

Classical algorithms for these problems include eigenvalue approaches (Cheeger 1970, Alon 1985) and multicommodity flows (Leighton and Rao 1988). The latter yields a O(log n)-approximation, and it has been an open problem to improve this ratio. We describe a new O(\sqrt{log n})-approximation algorithm.

The algorithm relies on semidefinite relaxations that use the triangle inequality constraints. Analysing these relaxations had been an important open problem as well.

We also introduce the notion of {\em expander flows}, which constitute an interesting ``certificate'' of a graph's expansion. We use them to prove a surprising new theorem about graph embeddings: for every n-node graph it is possible to embed an n-node expander graph in it with appropriate dilation and congestion. We think this embedding theorem may have other applications.

Our techniques are an interesting mix of graph theory and high-dimensional geometry, including the use of "measure concentration." The talk will be self-contained, and showcases a newer proof that is simpler than the one in earlier versions of the paper.

(Joint work with Satish Rao and Umesh Vazirani; to appear at ACM STOC 2004)

4.00P.M. Coffee

Andrei Broder, IBM Research
Sic Transit Gloria Telae: Towards an Understanding of the Webs Decay (pdf)

Abstract: The rapid growth of the World Wide Web has been noted, tracked, and modeled extensively.  However, recent studies have documented the dual phenomenon: web pages have short half lives, and thus the Web exhibits rapid death as well.  Consequently, page creators are faced with an increasingly difficult task of keeping links and content up-to-date: many pages are seldom updated, becoming ``decayed'', and less effective as information resources.

In this work we formalize a page ``decay'' measure based solely on the Web graph properties and present algorithms for computing it efficiently.  We explore this measure by presenting a number of validations, and use it to identify interesting artifacts on today's Web.  We then describe a number of potential applications of interest to search engines, web page maintainers, ontologists, and individual users.

There are many open questions: Can this measure be made more accurate by combining it with NLP page content analysis? Can we make decay inferences from observed page changes over a long time period?  What is a good birth/death model for the Web graph that agrees with our empirical observations?  Time permitting we will explore these and more.

This is joint work with Ziv Bar-Yossef, Ravi Kumar, and Andrew Tomkins.

7.00P.M. Workshop Dinner
Saturday April 10
8.45A.M. Coffee

Andrew Tomkins, IBM Almaden
Online Conversations: Aliases, Topics, and Propagation Networks (pdf1), (pdf2)

Abstract: In this talk, we consider three aspects of online conversations across the web, as seen in bulletin boards, blogs, and communities. First, we consider the viability of pseudonyms, or aliases, as a mechanism to allow users to participate anonymously in these conversations. We show algorithms that are disturbingly effective at linking together the multiple aliases of a single author through analysis of the message content. Next, we consider the structure of the topics that emerge in online conversations, and present a recursive formulation of conversation structure based on an experimental study. Finally, we present a simple model for the flow of a topic from one user to another based on the theory of infectious diseases, and we present a simple iterative algorithm to learn such models from observations of conversations.

Variously joint work with R. Guha, Daniel Gruhl, David Liben-Nowell, Jasmine Novak, and Prabhakar Raghavan.

10.15A.M. Coffee

Fan Chung Graham
Coupling on-line and off-line analyses for random power law graphs

Abstract: We develop a coupling technique for analyzing on-line models by using off-line models. This method is especially effective for a growth-deletion model which generalizes and includes the preferential attachment model for generating large complex networks that simulate numerous realistic networks. The analysis has made use of two tools -- a general framework for comparing different random graph models and a general Martigale inequality that requires only a weak Lipschitz condition. By coupling the on-line model with the off-line model for random power law graphs, we derive bounds for a number of graph properties including diameter, average distances, connected components and spectral bounds. For example, we prove that a power law graph generated by the growth-deletion model almost surely has diameter $ O(\log n)$ and average distance $ O(\log \log n)$.

This is a joint work with Lincoln Lu


Peter Dodds, Columbia University
Social Search and the Small World Phenomenon: Experiment and Theory (pdf1), (pdf2), (pdf3,) (pdf4)

Abstract: Social networks have the surprising property of being ``searchable'': ordinary people are capable of directing messages through their network of contacts to reach a specific but distant target person in relatively few steps (<10). This property of searchability is popularly embraced as the notion that the world is `small' and was first studied experimentally by Stanley Milgram in the late 1960's. I will present results from an online version of Milgram's experiment along with a theoretical model that offers an explanation of social network searchability in terms of recognizable personal identities defined along a number of social dimensions. Our model defines a class of searchable networks and a method for searching them that may be applicable to many network search problems including the location of data files in peer-to-peer networks, pages on the World Wide Web, and information in distributed databases.

12.30P.M. Lunch

Pedro Domingos, University of Washington
Modeling and optimizing word of mouth (pdf1), (pdf2)

Abstract: Viral marketing exploits word of mouth in online networks. When successful, it can produce massive returns from minimal investment. Unfortunately, success is very hard to ensure. Our research seeks to make viral marketing less of a black art and more of a quantitative process. We model social networks as Markov random fields, and mine these networks from data. We define the network value of a customer to be the expected gain in profits from other customers that results from marketing to her. We then identify the set of nodes with (ideally) maximal network value, market to them, and unleash a wave of word of mouth. Experiments on the Epinions Web of Trust and other networks demonstrate the benefits of this approach, its robustness to incomplete information, and the extremely high network value of some customers.

Time permitting, I will also briefly survey other Web research at the University of Washington, including predicting the evolution of the Web, combining link and content information in Web search, propagating trust across networks, and modeling the behavior of Web surfers.

Joint work with Matt Richardson.


David Kempe
Maximizing the Spread of Influence in a Social Network (pdf)

Abstract: A social network - the graph of relationships and interactions within a group of individuals - plays a fundamental role as a medium for the spread of information, ideas, and influence among its members. An idea or innovation will appear - for example, the use of cell phones among college students, the adoption of a new drug within the medical profession, or the rise of a political movement in an unstable society - and it can either die out quickly or make significant inroads into the population.

The resulting collective behavior of individuals in a social network has a long history of study in sociology. Recently, motivated by applications to word-of-mouth marketing, Domingos and Richardson proposed the following optimization problem: allocate a given "advertising" budget so as to maximize the (expected) number of individuals who will have adopted a given product or behavior.

In this talk, we will investigate this question under the mathematical models of influence studied by sociologists. We present and analyze a simple approximation algorithm, and show that it guarantees to reach at least a 1-1/e (roughly 63%) fraction of what the optimal solution can achieve, under many quite general models. In addition, we experimentally validate our algorithm, comparing it to several widely used heuristics on a data set consisting of collaborations among scientists.

Joint work with Jon Kleinberg and Eva Tardos

3.30P.M. Coffee

Marcio Menezes, University of Notre Dame
Hot Spots and Universality in Network Dynamics

Most complex networks serve as conduits for various dynamical processes, ranging from mass transfer by chemical reactions in the cell to packet transfer on the Internet. We collected data on the time dependent activity of various natural and technological networks, finding evidence of orders of magnitude differences in the fluxes of individual nodes. This dynamical inhomogeneity reflects the emergence of localized high flux regions or ``hot spots'', carrying an overwhelming fraction of the network's activity. We also find that each system is characterized by a unique scaling law, coupling the flux fluctuations with the total flux on individual nodes, a result of the competition between the system's internal collective dynamics and changes in the external environment [M. Argollo de Menezes and A.-L. Barabasi, Phys. Rev. Lett. 92, 028701 (2004)]. We propose a method to separate these two components, allowing us to predict the relevant scaling exponents and correlate them to the susceptibility of different systems to perturbations. As multichannel measurements are becoming the norm in most fields, the method could help uncover the collective dynamics of a wide array of complex systems.

Joint with A.-L. Barabasi.


Web Structure and Algorithms Workshop
Online Registration Form










This material is based upon work supported by National Science Foundation under Grant No. 0122581.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the
National Science Foundation