This is the home page for class SDS 348, Computational Biology and Bioinformatics. All relevant course materials will be posted here.

(For the latest version of this class, with updated materials, please check here.)

Syllabus and schedule: SDS348_syllabus_spring2015.pdf

Lectures

1. Jan 20, 2015 – Introduction

2. Jan 22, 2015 – R review, R Markdown

3. Jan 27, 2015 – Data visualization with ggplot2

4. Jan 29, 2015 – Data visualization with ggplot2

5. Feb 03, 2015 – Working with tidy data

6. Feb 05, 2015 – Working with tidy data

7. Feb 10, 2015 – Working with tidy data

8. Feb 12, 2015 – Rearranging data tables with tidyr

9. Feb 17, 2015 – Principal Components Analysis (PCA)

10. Feb 19, 2015 – k-means clustering

11. Feb 24, 2015 – Binary prediction/logistic regression

12. Feb 26, 2015 – Sensitivity/Specificity, ROC curves

13. Mar 3, 2015 – Training and test data sets, cross-validation

14. Mar 5, 2015 – Cancelled due to inclement weather

15. Mar 10, 2015 – Installing and running python

16. Mar 12, 2015 – Data structures in python

17. Mar 24, 2015 – Control flow (if/for) in python

18. Mar 26, 2015 – Functions in python

19. Mar 31, 2015 – Introduction to Biopython

20. Apr. 2, 2015 – More Biopython, File Input/Output

21. Apr. 7, 2015 – Working with genomes

22. Apr. 9, 2015 – Regular expressions

23. Apr. 14, 2015 – Using regular expressions to analyze data

24. Apr. 16, 2015 – Aligning sequences

25. Apr. 21, 2015 – Multiple sequence alignments and phylogenetic trees

26. Apr. 23, 2015 – BLAST

27. Apr. 28, 2015 – Working with protein structures

28. Apr. 30, 2015 – PDB’s, protein structure, and dynamics

Guest lecture by Austin Meyer

29. May 5, 2015 – Epidemiology

Guest lecture by Steve Bellan

30. May 7, 2015

Guest lecture by Howard Ochman

Homeworks

Projects

  • Project 1: Project1.Rmd (due Feb 24, 2015)
  • Project 2: Project2.Rmd (due Mar 31, 2015)
    • Project 2 Clarifications:
      • Regarding ROC curves for part 1: you should make a single plot containing two ROC curves. Both ROC curves should assess performance of the final, selected model. One of the curves should be made from the training data, and the other should be made from the test data.
      • Hint for making the logistic curve in part 1 - you will need to use the predict() (Class 11!) function!
        • The general code predict(model, data, type = "response") will return an array of the fitted values (probabilities) from the given model fit to the given data.
        • The general code predict(model, data) will return an array of the linear predictors from the given model fit to the given data.
  • Project 3: Project3.pdf (due May7, 2015)

Discussion sections

Jan. 21, 2015

Jan. 28, 2015

March 11, 2015

  • Python 1: working with python data structures. here

April 1, 2015

April 8, 2015