SDS 348, Spring 2019
This is the home page for class SDS 348, Computational Biology and Bioinformatics. All relevant course materials will be posted here.
Syllabus: SDS348_syllabus_spring2019.pdf
Lectures
1. Jan 22, 2019 – Introduction, R Markdown
- Slides: class1.pdf
- R Markdown basics: https://rmarkdown.rstudio.com/authoring_basics.html
- Class compute servers:
- You can download R from here: https://cran.r-project.org/
- You can download RStudio from here: https://www.rstudio.com/products/rstudio/download/
- In-class worksheet:
2. Jan 24, 2019 – R review
- Slides: class2.pdf
- Biostats supplement on regression modeling: statistical_modeling.pdf
- General R tutorial (fairly long and detailed): http://www.cyclismo.org/tutorial/R/index.html
- In-class worksheet:
3. Jan 29, 2019 – Data visualization with ggplot2
- Slides: class3.pdf
- R for Data Science: https://r4ds.had.co.nz/
- ggplot2 reference manual: https://ggplot2.tidyverse.org/
- ggplot2 video tutorial: http://varianceexplained.org/RData/lessons/lesson2/segment1/
- In-class worksheet:
4. Jan 31, 2019 – Data visualization with ggplot2
- Slides: class4.pdf
- Visualization book: Fundamentals of Data Visualization
- Tidyverse style guide: style.tidyverse.org
- In-class worksheet:
5. Feb 5, 2019 – Working with tidy data
- Slides: class5.pdf
- dplyr chapter in R for Data Science: Chapter 5: Data transformation
- dplyr package on the tidyverse website: https://dplyr.tidyverse.org/
- Tidy data paper by Wickham: J. Stat. Soft. 59:10, 2014
- In-class worksheet:
6. Feb 7, 2019 – Working with tidy data
- Slides: class6.pdf
- In-class worksheet:
7. Feb 12, 2019 – Working with tidy data
- Slides: class7.pdf
- In-class worksheet:
8. Feb 14, 2019 – Rearranging data tables with tidyr
- Slides: class8.pdf
- tidyr vignette: https://tidyr.tidyverse.org/articles/tidy-data.html
- In-class worksheet:
9. Feb 19, 2019 – Principal Components Analysis (PCA)
- Slides: class9.pdf
- Intro to PCA: http://setosa.io/ev/principal-component-analysis/
- PCA tutorial with mathematical background: https://arxiv.org/pdf/1404.1100.pdf
- In-class worksheet:
10. Feb 21, 2019 – k-means clustering
- Slides: class10.pdf
- Interactive k-means demonstration: https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
- Stackoverflow post on choosing the right number of clusters
- Medium article: The 5 clustering algorithms data scientists need to know.
- In-class worksheet:
11. Feb 26, 2019 – Binary prediction/logistic regression
- Slides: class11.pdf
- Wikipedia page on logistic regression: https://en.wikipedia.org/wiki/Logistic_regression
- In-class worksheet:
12. Feb 28, 2019 – Sensitivity/Specificity, ROC curves
- Slides: class12.pdf
- Wikipedia page on sensitivity and specificity: https://en.wikipedia.org/wiki/Sensitivity_and_specificity
- Wikipedia page on ROC curves: https://en.wikipedia.org/wiki/Receiver_operating_characteristic
- ROC animations: https://github.com/dariyasydykova/open_projects/tree/master/ROC_animation
- In-class worksheet:
13. Mar 5, 2019 – Training and test data sets, cross-validation
- Slides: class13.pdf
- Wikipedia page on cross-validation: here
- In-class worksheet:
14. Mar 7, 2019 – Introduction to python, basic data structures
- Slides: class14.pdf
- Alternative 1 to educcomp: Google Collaboratorium
- Alternative 2 to educcomp: Anaconda
- Official Python3 tutorial: https://docs.python.org/3/tutorial/
- Chapter 3 of the official tutorial: An informal introduction
- In-class worksheet:
15. Mar 12, 2019 – Control flow in python
- Slides: class15.pdf
- Chapter 4.1–4.5 of the official tutorial: More Control Flow Tools
- Chapter 5.5 of the official tutorial: Dictionaries
- In-class worksheet:
16. Mar 14, 2019 – Functions in python
- Slides: class16.pdf
- Chapter 4.6, 4.7 of the official tutorial: Defining functions
- In-class worksheet:
17. Mar 26, 2019 – More on python data structures, classes
- Slides: class17.pdf
- Chapter 9 of the official tutorial: Classes
- In-class worksheet:
18. Mar 28, 2019 – Working with files
- Slides: class18.pdf
- Chapter 7.2 of the official tutorial: Reading and writing files
- In-class worksheet:
19. Apr 2, 2019 – Introduction to Biopython
- Slides: class19.pdf
- Biopython website: https://biopython.org
- Official Biopython tutorial: https://biopython.org/DIST/docs/tutorial/Tutorial.html
- NCBI Entrez/PubMed website: https://www.ncbi.nlm.nih.gov/
- In-class worksheet:
20. Apr 4, 2019 – Working with gene features and genomes
- Slides: class20.pdf
- Biopython Tutorial on sequence features: SeqFeature objects
- Official feature documentation from the International Nucleotide Sequence Database Collaboration: Feature Key Reference
- In-class worksheet:
21. Apr 9, 2019 – Running queries on Entrez
- Slides: class21.pdf
- Biopython SeqIO documentation: SeqIO
- In-class worksheet:
22. Apr 11, 2019 – Regular expressions
- Slides: class22.pdf
- Python regular expression editor: https://pythex.org/
- Regular expression visualization: https://regexper.com/
- Official Python regular expression documentation: Regular Expression HOWTO
- Alternative regular expression tutorial: Python Regular Expressions
- In-class worksheet:
23. Apr. 16, 2019 – Using regular expressions to analyze data
- Slides: class23.pdf
- Python regular expression editor: https://pythex.org/
- Regular expression visualization: https://regexper.com/
- Regex crosswords: regexcrossword.com
- In-class worksheet:
24. Apr. 18, 2019 – Using regular expressions to analyze data
- Slides: class24.pdf
- Python regular expression editor: https://pythex.org/
- Regular expression visualization: https://regexper.com/
- In-class worksheet:
25. Apr. 23, 2019 – Aligning sequences
- Slides: class25.pdf
- Wikipedia page on the Needleman–Wunsch algorithm: Needleman–Wunsch algorithm
- Alignment software:
- Example alignments
- In-class worksheet:
26. Apr. 25, 2019 – Global and local alignments, BLAST
- Slides: class26.pdf
- Wikipedia page on the Smith-Waterman algorithm: Smith-Waterman algorithm
- NCBI BLAST search: https://blast.ncbi.nlm.nih.gov/Blast.cgi
- Wikipedia page on BLAST: BLAST
- Biopython BLAST documentation: Chapter 7: BLAST
- In-class worksheet:
27. Apr. 30, 2019 – Multiple sequence alignments and phylogenetic trees
- Slides: class27.pdf
- Wikipedia page on multiple sequence alignments: Multiple sequence alignment
- Wikipedia page on phylogenetic trees: Phylogenetic trees
- In-class exercises:
28. May 2, 2019 – Working with protein structures
- Slides: class28.pdf
- Obtain PyMOL: Educational-use PyMOL
- PyMOL tutorial: Practical PyMOL for Beginners
- Protein data bank: https://www.rcsb.org/
29. May 7, 2019 – Plotting geospatial data
- Slides: class29.pdf
- Fundamentals of dataviz book chapter: Visualizing geospatial data
- R sf package: Simple Features for R vignette
- In-class worksheet:
30. May 9, 2019 – Making animated plots
- Slides: class30.pdf
- R gganimate package: https://gganimate.com/
- Hans Rosling video: 200 years in 4 minutes
- In-class worksheet:
Homeworks
All homeworks are due by 4:00pm on the day they are due. Homeworks need to be submitted as pdf files on Canvas.
- Homework 1 (due Jan 29, 2019)
- Homework 2 (due Feb 5, 2019)
- Homework 3 (due Feb 12, 2019)
- Homework 4 (due Feb 19, 2019)
- Homework 5 (due Mar 5, 2019)
- Homework 6 (due Mar 12, 2019)
- Homework 7 (due Mar 26, 2019)
- Homework 8 (due Apr 9, 2019)
- Homework 9 (due Apr 16, 2019)
- Homework 10 (due Apr 23, 2019)
- Homework 11 (due Apr 30, 2019)
Labs
1. Jan 23, 2019
- Slides: lab1.pdf
- Guide to converting from HTML to PDF: html_to_pdf_guide.pdf
- Lab worksheet:
2. Jan 30, 2019
- Fundamentals of Data Visualizations (chapter 5): Directory of visualizations
- Lab worksheet:
3. Feb 6, 2019
- Lab worksheet:
4. Feb 13, 2019
- Slides: lab4.pdf
- Animations for different joins in dplyr: Tidy animated verbs
- Lab worksheet:
5. Feb 20, 2019
- Lab worksheet:
6. Feb 27, 2019
- Lab worksheet:
7. Mar 6, 2019
- Lab worksheet:
8. Mar 13, 2019
- Lab worksheet:
9. Mar 27, 2019
- Lab worksheet:
10. Apr 3, 2019
- Lab worksheet:
11. Apr 10, 2019
- Lab worksheet:
12. Apr 17, 2019
- Lab worksheet:
13. Apr 24, 2019
- Lab worksheet:
14. May 1, 2019
- Lab worksheet:
Projects
All projects are due by 4:00pm on the day they are due. Projects need to be submitted on Canvas, both in pdf format and as source code (plus data where needed).
- Project 1 (due Feb 26, 2019):
- Project 2 (due Apr 2, 2019):
- SDS 385 Assignment (due Apr 16, 2019, grad students only):
- Project 3 (due May 9, 2019):