Syllabus
This course will cover fundamental algorithmic techniques in computational genomics. It is the first
of a series of four courses (567, 568, 569 and 570) designed to give a comprehensive core knowledge
in bioinformatics and computational biology. This course is a core requirement for BCB students.
It is also meant for students in Computer Engineering and Computer Science who are interested in
doing research in this area, pursue career opportunities in bioinformatics, or want to acquire some
knowledge of this field.
Text
There is no prescribed text. Some supplementary material is provided as aid in studying from time
to time. This may include slides and book chapters, as appropriate.
Prereqs
Computer Science 311 or equivalent. In order understand the material in this class you will
need to be familiar with the basic concepts of algorithm design, including data structures, assymptotic notation, and ideas such as divide and conqure approaches to problem solving.
Topics
Introduction to Biological Sequences
- Types of biological sequences
- Structure of biological sequences
- Interaction between biological sequences
- Encodings of biological sequences
Sequence Alignment
- Applications
- Dynamic Programming
- Global Alignment
- Semi-Global Alignment
- Local Alignment
- Banded Alignment
- Space Saving Tecniques
- Affine Gap Alignment
- Spliced Alignment
String Data Structures and Algorithms
- Lookup Tables
- Suffix Trees
- Suffix Arrays
- Construction
- Longest Common Prefix
- Lowest Common Ancestor
- Applications
Multiple Alignment
- Protein Alignment considerations
- Substitution Matrices
- Multiple Alignment Description
- Scoring Functions
- Exact Solution
- Carillo-Lippman Heuristic
- Star and Tree Alignments
- Progressive Alignment
- ClustalW
- Dialign
- T-Coffee
Database Search
- Exact Match Heuristics
- Seeded Alignments
- Fasta
- Blast
- Improvements to Blast
Genome Assembly
- Sequencing Technology
- Sequencing Process
- Properties of Genomes
- Properties of Genomic Data
- Base Calling
- Superstring Formulation
- Sequencing by Hybridization Formulation
- Overlap Graph Formulation
- String Graph Formulation
- Euler Tour Formulation
- Overlap/Layout/Consensus Paradigm
- Phrap
- Celera
- CAP3
- Arachne
- PaCE Approach to Gene Island Assembly
- Optical Map Assembly
Phylogenetics
- Distance and Character Based Methods
- Parsimony and Perfect Phylogeny
- Heuristic Methods