Lab Worksheet 10

Problem 1: In bioinformatics, we are often interested in determining whether or not two DNA or amino acid sequences are similar. One simple measure of similarity is called pairwise sequence identity. To calculate pairwise sequence identity, we take two sequences, count the number of positions in which both sequences share the same nucleotide or amino acid, and then divide by the total number of positions. For example, say we have these two DNA sequences:

      Position: 1 2 3 4 5 6
    Sequence 1: A T C G T A
    Sequence 2: A T G A G A
Identical(y/n): y y n n n y

There are 3 positions that match out of 6 total positions, so the sequence identity is 50% (3/6).

Write a function that calculates the pairwise sequence identity for any two sequences of the same length. (Do not worry about properly aligning the two sequences. Sequence alignment is a concept we will return to later.) Your function should take two arguments: seq1 and seq2. Make sure that your function checks for equal sequence lengths. If the input sequences are of different lengths, your function should return an error message. Otherwise, your function should return the pairwise sequence identity as a percentage.

Finally, use your function to calculate the pairwise sequence identity of the two amino acid sequences below.

In [1]:

# Your code goes here.

Problem 2: Open the file "road_not_taken.txt" from In-class Worksheet 19. Randomly shuffle the order of the lines in the poem, then save them to a file called "road_not_taken_shuffled.txt". Open your new file and print the first 10 lines to make sure that the poem has been shuffled. HINT: You can use the function shuffle() to shuffle the items in a list.

In [1]:
# You will need the library "random" to shuffle the lines in the poem.
from random import shuffle

# Your code goes here.
In [ ]: