Class 25: Needleman-Wunsch¶

April 28, 2020

In this class, our goal is to implement the Needleman-Wunsch algorithm in Python. You can read more about the Needleman-Wunsch algorithm on Wikipedia. The Wikipedia page contains pseudo-code which you might find helpful.

Since the Needleman-Wunsch algorithm works on a matrix of scores, it is helpful to discuss how to represent a matrix in Python. We can do so by creating a list of lists. Let's say we want to make a matrix that looks like this:

1	3	5	7
2	3	4	5
5	2	20	3

In Python, we would create it as follows:

my_matrix = []
# Fill out the 0th row
my_matrix.append([1, 3, 5, 7])
# Fill out the 1st row
my_matrix.append([2, 3, 4, 5])
# Fill out the 2nd row
my_matrix.append([5, 2, 20, 3])

print(my_matrix)

[[1, 3, 5, 7], [2, 3, 4, 5], [5, 2, 20, 3]]

However, when we print the matrix, it is not easy to read, because it is shown as a list of lists. So it is helpful to define a function print_matrix() that prints a matrix in a nicer form:

def print_matrix(mat):
    # Loop over all rows
    for i in range(0, len(mat)):
        print("[", end = "")
        # Loop over each column in row i
        for j in range(0, len(mat[i])):
            # Print out the value in row i, column j
            print(mat[i][j], end = "")
            # Only add a tab if we're not in the last column
            if j != len(mat[i]) - 1:
                print("\t", end = "")
        print("]\n")

print_matrix(my_matrix)

[1	3	5	7]

[2	3	4	5]

[5	2	20	3]

To retrieve an indiviual value from a matrix, we use code of the form my_matrix[row][column]. So, to retrieve for example the value in the 0th column of the 2nd row, we write:

print("The value in the 2nd row and the 0th column is:", my_matrix[2][0])

The value in the 2nd row and the 0th column is: 5

It will also be helpful to have a function that can create a matrix of zeros.

# A function for making a matrix of zeroes
def zeros(rows, cols):
    return [[0 for i in range(cols)] for j in range(rows)]

zero_matrix = zeros(3, 5) # make zero matrix with 3 rows and 5 columns
print_matrix(zero_matrix)

[0	0	0	0	0]

[0	0	0	0	0]

[0	0	0	0	0]

Problems¶

Problem 1:

Write a function that takes two sequences as input and returns a matrix of Needleman-Wunsch scores. You do not have to do the back-tracing, just fill out the matrix.

Break the problem down into as many small steps as possible. Here are a few hints:

Before you calculate any scores, make an empty matrix of the appropriate size using the zeros() function defined below.
Fill out the 0th row and 0th column before you calculate any other scores.
The max() function will return the maximum value from a list of values. For example max(1, 7, 3) will return 7.
Make liberal use of the range() function.
Use the print_matrix() function to print out your matrix as frequently as possible. Always make sure that your code is doing what you think it's doing!
Remember, in Python, we start counting from 0.

# Use these values to calculate scores
gap_penalty = -1
match_award = 1
mismatch_penalty = -1

# Make a score matrix with these two sequences
seq1 = "ATTACA"
seq2 = "ATGCT"

# A function for determining the score between any two bases in alignment
def match_score(alpha, beta):
    if alpha == beta:
        return match_award
    elif alpha == '-' or beta == '-':
        return gap_penalty
    else:
        return mismatch_penalty

# The function that actually fills out the matrix of scores
def needleman_wunsch(seq1, seq2):
    
    # lengths of the two sequences
    n = len(seq1) 
    m = len(seq2)
    
    # generate matrix of zeros to store scores
    score = zeros(m+1, n+1)
    
    ########################
    # Your code starts here
    ########################
    
    # Use the following steps as a guide to calculate the score matrix
    
    # 1. Fill out first column
    
    # 2. Fill out first row
    
    # 3. Fill out all other values in the score matrix

    # Return the final score matrix
    return score

# Test out the needleman_wunsch() function
print_matrix(needleman_wunsch(seq1, seq2))

[0	0	0	0	0	0	0]

[0	0	0	0	0	0	0]

[0	0	0	0	0	0	0]

[0	0	0	0	0	0	0]

[0	0	0	0	0	0	0]

[0	0	0	0	0	0	0]

If this was easy¶

Problem 2:

Modify your code from Problem 1 to back-trace through the score matrix and print out the final alignment. Hint: For the back-tracing, you'll want to use a while loop (or several of them).

def nw_backtrace(seq1, seq2):
    # Calculate score table using the function we wrote earlier
    score = needleman_wunsch(seq1, seq2)
    
    # Create variables to store alignment
    align1 = ""
    align2 = ""
    
    ########################
    # Your code starts here
    ########################
    
    # Back-trace from bottom right of the score matrix
    # and compute the alignment 
    
    return(align1, align2)

output1, output2 = nw_backtrace(seq1, seq2)

print(output1 + "\n" + output2)