Algorithms for SNP and Haplotype Analysis, Aladdin Center

Organizing Committee: Guy Blelloch, Eran Halperin, R. Ravi, Kathryn Roeder, Russell Schwartz

The recent availability of large-scale genotype data has created new possibilities for tracking down genetic risk factors of complex diseases. Extensive data on human genetic variability in healthy and diseased individuals, predominantly in the form of single nucleotide polymorphisms (SNPs), is giving us the raw material we will need to locate genes and genetic variants that are associated with disease. At the same time, though, finding the faint statistical signals in the growing mass of SNP data poses formidable cross-disciplinary challenges. Much recent attention has focused on the use of haplotypes--evolutionarily conserved sets of contiguous polymorphisms--as a way of exploiting regularities in the data sets to reduce the dimensionality of the problem. Many questions remain about the nature of this haplotype conservation, how best to use it in disease inference studies, and how effective it will prove in practice. Important areas of active research include developing new methods to gather SNP and haplotype data, characterizing the haplotype structure of the human genome, improving the design of disease association studies, and enhancing their statistical power. Promising approaches to these problems are currently emerging from a combination of new methods for gathering genetic polymorphism data, better understanding of the biological basis of genome structure, sophisticated statistical techniques for finding meaning within these data, and advanced computational methods for inferring genetic structure and applying it to various related analysis problems.

The goal of this PROBE is to explore the potential for theoretical computer science to contribute to gathering, analyzing, and applying data on human genetic variations. A workshop on this topic in February 2004 is intended to bring together an interdisciplinary group of geneticists, computer scientists, and statisticians to pool their expertise and discuss current research on these topics. Research supported by the PROBE is focusing primarily on two problem areas. The first is the development of models and algorithms for inferring haplotype structure applying it to problems in association study design. The second is the development of methods for inferring haplotypes from unphased genotype data.

Second RECOMB Satellite Workshop on Computational Methods for SNPs and Haplotypes. February 20-21, 2004