Enter your name and EID here
This homework is due on April 13, 2020 at 12:00pm. Please submit as a PDF file on Canvas. Before submission, please re-run all cells by clicking "Kernel" and selecting "Restart & Run All."
Problem 1 (3 pts): The interleukin 6 gene (IL6) encodes for a cytokine that mediates a variety of immune response pathways in humans. Download the file "IL6_gene_human_lower.txt" to your computer, upload the file to your Jupyter session, then read the sequence in line-by-line using open()
and readlines()
. Print out the sequence of the gene such that all nucleotides have been converted to uppercase and white space has been removed.
# your code here
Problem 2 (4 points): In bioinformatics, k-mers refer to all the possible subsequences (of length k) from a read obtained through DNA sequencing. For example, if the DNA sequencing read is "ATCATCATG", then the 3-mers in that read include "ATC" (which occurs twice), "TCA" (which occurs twice), "CAT" (occurs twice), and "ATG" (occurs once). You can read more about k-mers on Wikipedia.
a) Write a function that takes a string of nucleotides as input and returns a dictionary with all 3-mers present in that string, and the number of times that each 3-mer occurs. Then, validate your function by finding the 3-mers in the DNA sequence test_seq
defined below.
The output of your function should be a dictionary that is structured like this (although it will have several more entries):
{"ATC": 2, "TCA": 2, "CAT": 2, "ATG": 1}
where each key is a 3-mer itself (e.g., "ATC") and each value is the number of times that 3-mer occurs. Visually inspect the output of your function to ensure it is counting the 3-mers in the test sequence correctly. *HINT: You will need to use range() and len() to loop through 3-mer slices of a sequence.*
# test case; verify your code works by finding all 3-mers in this sequence
test_seq = "ATCATGCGCATG"
# your code here
Problem 3 (3 points): Download the file "covid19_genome.txt" to your computer, upload the file to your Jupyter session, then read and load the sequence in using open()
and read()
. Use your function to count the different 3-mers in the sequence.
# your code here