ALADDIN
CENTER Carnegie Mellon UniversityCarnegie Mellon Computer Science DepartmentSchool of Computer Science
Abstracts
The Joint ALADDIN/Theory/Operations Research Seminar
Aladdin
About
Calendar
People
PROBEs
Workshops
Papers
Education
Related Activities
Corporate
Contact
 
Captcha
Outreach Roadshow

Algorithms for multiple sequence alignment of genomes and proteins
Serafim Batzoglou, Stanford
October 1, 2004

Abstract

Without doubt, one of the most worthwhile goals of genomics in the near future is to build detailed and comprehensive reference maps, or alignments, of a large number of genomes from each evolutionary scope (mammals, vertebrate, fungi, and other), and to locate within those maps biologically functional and evolutionarily constrained elements. Such a resource will allow study of biological function and evolution at the level of multiple alignments, rather than single sequences, and be a tremendously useful starting point for further biological analyses. At the heart of comparative genomics are algorithms for multiple sequence alignment. Alignment is an old problem, but still remarkably far from being solved. Moreover the scale of newly available sequence data poses novel challenges.

In this talk, we will describe algorithms and systems for large-scale multiple alignment of genomic sequences. We will present the LAGAN family of tools for detecting synteny and performing multiple alignment between several genomes. LAGAN is being applied to whole-genome alignment of several mammals, as well as to the ENCODE project of comprehensively analyzing 1% of the human genome.

In addition, we will describe a new objective function for alignment, based on a notion we call "probabilistic consistency". We will present ProbCons, a hidden Markov model-based tool for multiple alignment of proteins, which uses probabilistic consistency instead of the traditional Viterbi (Needleman-Wunsch) step and exhibits much improved accuracy over other aligners.

This material is based upon work supported by National Science Foundation under Grant No. 0122581.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the
National Science Foundation