# Homework 8¶

Enter your name and EID here

This homework is due on April 3, 2018 at 7:00pm. Please submit as a PDF file on Canvas. Before submission, please re-run all cells by clicking "Kernel" and selecting "Restart & Run All."

Problem 1 (5 points): In bioinformatics, k-mers refer to all the possible subsequences (of length k) from a read obtained through DNA sequencing. For example, if the DNA sequencing read is "ATCATCATG", then the 3-mers in that read include "ATC" (which occurs twice), "TCA" (which occurs twice), "CAT" (occurs twice), and "ATG" (occurs once). You can read more about k-mers on Wikipedia.

a) Write a function that takes a string of nucleotides as input and returns a dictionary with all 4-mers present in that string, and the number of times that each 4-mer occurs. Then use your function to find the 4-mers in the DNA sequence my_seq defined below.

The output of your function should be a dictionary that is structured like this (although it will have several more entries):

{"ATCA": 2, "TCAT": 2, "CATC": 1}

where each key is a 4-mer itself (e.g., "ATCA") and each value is the number of times that 4-mer occurs.

b) Come up with a short DNA sequence and use it to verify manually that your function generates the correct result. Explain your reasoning in 2-3 sentences.

In [1]:
# Find all 4-mers in this sequences
my_seq = "CAGCCCAATCAGGCTCTACTGCCACTAAACTTACGCAGGATATATTTACGCCGACGTACT"