Theory/O.R. Schedules-Aladdin Center

Algorithms for multiple sequence alignment of genomes and proteins
Serafim Batzoglou, Stanford
October 1, 2004

Abstract

Without doubt, one of the most worthwhile goals of genomics in the near future is to build detailed and comprehensive reference maps, or alignments, of a large number of genomes from each evolutionary scope (mammals, vertebrate, fungi, and other), and to locate within those maps biologically functional and evolutionarily constrained elements. Such a resource will allow study of biological function and evolution at the level of multiple alignments, rather than single sequences, and be a tremendously useful starting point for further biological analyses. At the heart of comparative genomics are algorithms for multiple sequence alignment. Alignment is an old problem, but still remarkably far from being solved. Moreover the scale of newly available sequence data poses novel challenges.

In this talk, we will describe algorithms and systems for large-scale multiple alignment of genomic sequences. We will present the LAGAN family of tools for detecting synteny and performing multiple alignment between several genomes. LAGAN is being applied to whole-genome alignment of several mammals, as well as to the ENCODE project of comprehensively analyzing 1% of the human genome.

In addition, we will describe a new objective function for alignment, based on a notion we call "probabilistic consistency". We will present ProbCons, a hidden Markov model-based tool for multiple alignment of proteins, which uses probabilistic consistency instead of the traditional Viterbi (Needleman-Wunsch) step and exhibits much improved accuracy over other aligners.