Statistical Analysis and Modeling for Bioinformatics and Biomedical Applications


This course introduces the essential probabilistic and statistical methods used in bioinformatics and biomedical research. You’ll learn the fundamentals of probability, including first notions, probability axioms, conditional probability, random variables (discrete & continuous), probability distributions, expectation and variance, inferring a binomial proportion, the normal distribution, and the central limit theorem.

The course also covers statistics, including the following topics:

  • Estimating statistical parameters and fitting them to probability distributions
  • Testing hypotheses and assessing goodness of fit
  • T-tests and confidence intervals
  • Analysis of variance (ANOVA)
  • T-test versus ANOVA analysis of microarray data
  • Relevant applications, including stochastic processes, Markov chains and hidden Markov models, pairwise alignment using HMMs, statistics applied to machine learning, probabilistic graphical models, and the Broad and Bayesian approaches to testing a null hypothesis

You will learn the basics of the R programming language in R-based labs using applications of the theory. Lab exercises will teach you to infer a binomial population, conduct R analysis and statistical analysis of microarray data, analyze t-tests versus ANOVA and pairwise alignment using HMMs. The course also introduces the popular machine learning software known as Weka.

You will be graded on a number of homework assignments, one midterm and a final project. Lab assignments are not turned in. Calculus is not required to achieve a passing grade in the class, but familiarity with it is helpful to understand the conceptual framework. Online lecture notes outlining this knowledge will be provided. Previous programming experience is not required.

Note(s): This course was formerly titled "Data Analysis and Modeling for Bioinformatics."

Prerequisites :


Offering code Offering title
BINF.X404 Statistics