Algorithms for multiple sequence
alignment of genomes and proteins
Serafim Batzoglou, Stanford
October 1, 2004
Abstract
Without doubt, one of the most worthwhile goals of genomics in
the near future is to build detailed and comprehensive reference
maps, or alignments, of a large number of genomes from each evolutionary
scope (mammals, vertebrate, fungi, and other), and to locate within
those maps biologically functional and evolutionarily constrained
elements. Such a resource will allow study of biological function
and evolution at the level of multiple alignments, rather than
single sequences, and be a tremendously useful starting point
for further biological analyses. At the heart of comparative genomics
are algorithms for multiple sequence alignment. Alignment is an
old problem, but still remarkably far from being solved. Moreover
the scale of newly available sequence data poses novel challenges.
In this talk, we will describe algorithms and systems for large-scale
multiple alignment of genomic sequences. We will present the LAGAN
family of tools for detecting synteny and performing multiple
alignment between several genomes. LAGAN is being applied to whole-genome
alignment of several mammals, as well as to the ENCODE project
of comprehensively analyzing 1% of the human genome.
In addition, we will describe a new objective function for alignment,
based on a notion we call "probabilistic consistency".
We will present ProbCons, a hidden Markov model-based tool for
multiple alignment of proteins, which uses probabilistic consistency
instead of the traditional Viterbi (Needleman-Wunsch) step and
exhibits much improved accuracy over other aligners.