Homework 8

Enter your name and EID here

This homework is due on April 13, 2020 at 12:00pm. Please submit as a PDF file on Canvas. Before submission, please re-run all cells by clicking "Kernel" and selecting "Restart & Run All."

Problem 1 (3 pts): The interleukin 6 gene (IL6) encodes for a cytokine that mediates a variety of immune response pathways in humans. Download the file "IL6_gene_human_lower.txt" to your computer, upload the file to your Jupyter session, then read the sequence in line-by-line using open() and readlines(). Print out the sequence of the gene such that all nucleotides have been converted to uppercase and white space has been removed.

In [1]:
# your code here

Problem 2 (4 points): In bioinformatics, k-mers refer to all the possible subsequences (of length k) from a read obtained through DNA sequencing. For example, if the DNA sequencing read is "ATCATCATG", then the 3-mers in that read include "ATC" (which occurs twice), "TCA" (which occurs twice), "CAT" (occurs twice), and "ATG" (occurs once). You can read more about k-mers on Wikipedia.

a) Write a function that takes a string of nucleotides as input and returns a dictionary with all 3-mers present in that string, and the number of times that each 3-mer occurs. Then, validate your function by finding the 3-mers in the DNA sequence test_seq defined below.

The output of your function should be a dictionary that is structured like this (although it will have several more entries):

{"ATC": 2, "TCA": 2, "CAT": 2, "ATG": 1}

where each key is a 3-mer itself (e.g., "ATC") and each value is the number of times that 3-mer occurs. Visually inspect the output of your function to ensure it is counting the 3-mers in the test sequence correctly. *HINT: You will need to use range() and len() to loop through 3-mer slices of a sequence.*

In [2]:
# test case; verify your code works by finding all 3-mers in this sequence
test_seq = "ATCATGCGCATG"

# your code here

Problem 3 (3 points): Download the file "covid19_genome.txt" to your computer, upload the file to your Jupyter session, then read and load the sequence in using open() and read(). Use your function to count the different 3-mers in the sequence.

In [3]:
# your code here