Lab Worksheet 13¶

For this assignment, your goal is to implement the Needleman-Wunsch algorithm in Python. You can read more about the Needleman-Wunsch algorithm on Wikipedia. The Wikipedia page contains psuedo-code which you might find helpful.

Part 1¶

Write a function that takes two sequences as input, and returns a matrix of scores as we saw in Class 25. You do not have to do the back-tracing, just fill out the matrix.

To get you started, a matrix can be represented in Python as a list of lists. Let's say we want to make a matrix that looks like this:

1	3	5	7
2	3	4	5
5	2	20	3

# Here's how to make the matrix above from a list of lists
my_matrix = []
# Fill out the 0th row
my_matrix.append([1, 3, 5, 7])
# Fill out the 1st row
my_matrix.append([2, 3, 4, 5])
# Fill out the 2nd row
my_matrix.append([5, 2, 20, 3])

# Here is a helper function to print out matrices
def print_matrix(mat):
    # Loop over all rows
    for i in range(0, len(mat)):
        print("[", end = "")
        # Loop over each column in row i
        for j in range(0, len(mat[i])):
            # Print out the value in row i, column j
            print(mat[i][j], end = "")
            # Only add a tab if we're not in the last column
            if j != len(mat[i]) - 1:
                print("\t", end = "")
        print("]\n")

print_matrix(my_matrix)

# To retrieve the value from the 2nd row, in the 0th column, is relatively simple:
print("The value in the 2nd row and the 0th column is:", my_matrix[2][0])

# The format is always my_matrix[row][column].

[1	3	5	7]

[2	3	4	5]

[5	2	20	3]

The value in the 2nd row and the 0th column is: 5

Part 1 Hints¶

Break the problem down into as many small steps as possible. Here are a few hints:

Before you calculate any scores, make an empty matrix of the appropriate size using the zeros() function defined below.
Fill out the 0th row and 0th column before you calculate any other scores.
The max() function will return the maximum value from a list of values. For example max(1,7,3) will return 7.
Make liberal use of the range() function.
Use the print_matrix() function to print out your matrix as frequently as possible. Always make sure that your code is doing what you think it's doing!
Remember, in Python, we start counting from 0.

# Use these values to calculate scores
gap_penalty = -1
match_award = 1
mismatch_penalty = -1

# Make a score matrix with these two sequences
seq1 = "ATTACA"
seq2 = "ATGCT"

# A function for making a matrix of zeroes
def zeros(rows, cols):
    # Define an empty list
    retval = []
    # Set up the rows of the matrix
    for x in range(rows):
        # For each row, add an empty list
        retval.append([])
        # Set up the columns in each row
        for y in range(cols):
            # Add a zero to each column in each row
            retval[-1].append(0)
    # Return the matrix of zeros
    return retval

# A function for determining the score between any two bases in alignment
def match_score(alpha, beta):
    if alpha == beta:
        return match_award
    elif alpha == '-' or beta == '-':
        return gap_penalty
    else:
        return mismatch_penalty

# The function that actually fills out a matrix of scores
def needleman_wunsch(seq1, seq2):
    
    # length of two sequences
    n = len(seq1) 
    m = len(seq2)
    
    # Generate matrix of zeros to store scores
    score = zeros(m+1, n+1)
    
    ########################
    # Your code starts here
    ########################
    
    # Use the following steps as a guide to calculate the score matrix
    
    # 1. Fill out first column
    
    # 2. Fill out first row
    
    # 3. Fill out all other values in the score matrix

    # Return the final score matrix
    return score

# Test out the needleman_wunsch() function
print_matrix(needleman_wunsch(seq1, seq2))

[0	0	0	0	0	0	0]

[0	0	0	0	0	0	0]

[0	0	0	0	0	0	0]

[0	0	0	0	0	0	0]

[0	0	0	0	0	0	0]

[0	0	0	0	0	0	0]

Part 2: If that was easy...¶

Modify your code from Part 1 to back-trace through the score matrix and print out the final alignment. HINT: For the back-tracing, you'll want to use a while loop (or several of them).

def needleman_wunsch(seq1, seq2):
    
    # Store length of two sequences
    n = len(seq1)  
    m = len(seq2)
    
    # Generate matrix of zeros to store scores
    score = zeros(m+1, n+1)
   
    # Calculate score table
    
    # Create variables to store alignment
    align1 = ""
    align2 = ""
    
    # Back-trace from bottom right of the score matrix and compute the alignment 
    
    return(align1, align2)

output1, output2 = needleman_wunsch(seq1, seq2)

print(output1 + "\n" + output2)