|
|
Faculty
Advisor
|

Natalie Castellana |
|
|
|
The ultimate goal of most work on human
genome variations is establishing connections between
genotype and phenotype. We wish to learn from a set
of sites that vary in the population which particular
sites appear to be associated with a given condition,
typically a disease. These sites can then be used to
identify genes involved in the disease, which may help
us understand its causes or develop treatments. The
problem of finding these associations is most difficult
but also most important for the "complex diseases,"
those that are caused by a combination of genetic and
environmental factors. These diseases, which include
heart disease, diabetes, cancers, and Alzheimer's disease,
are the major causes of death in the developed world.
We therefore need to find ways to identify the relatively
faint signals connecting individual variant sites to
these complex diseases among the noise of environmental
factors and other sites. Haplotypes provide a way to
make this problem easier by condensing the information
content of the genome, reducing the search space of
the problem.
This project will explore the value of applying haplotype
models to genetic association studies. The REU student's
primary task will be conducting a comparison of single-SNP,
haplotype block, and haplotype motif models in terms
of their value to statistical association studies. This
project will begin with generating datasets of human
genetic variations by applying pre-existing simulation
tools and by retrieving real data from public repositories,
requiring the student to develop an understanding of
models of haplotype structure and the current state
of genome variation sequencing. It will then require
implementing algorithms for haplotype structure inference
by variations of the block and motif methods. This is
expected to require some algorithmic innovations in
improving on current methods for inferring haplotype
structures by the motif model. The project will further
require implementing several statistical tests for the
single-SNP, block, and motif models to identify genetic
variations correlated with phenotype as well as implementing
benchmarks for evaluating the effectiveness of a given
test. Finally, it will require extensive empirical studies
of the conditions under which various models and metrics
prove superior at detecting variants correlated with
phenotype. Prerequisites for the project include knowledge
of algorithm design and computer programming, some statistics,
and familiarity with basic concepts from molecular and
population genetics. |
|