April 28, 2020
In this class, our goal is to implement the Needleman-Wunsch algorithm in Python. You can read more about the Needleman-Wunsch algorithm on Wikipedia. The Wikipedia page contains pseudo-code which you might find helpful.
Since the Needleman-Wunsch algorithm works on a matrix of scores, it is helpful to discuss how to represent a matrix in Python. We can do so by creating a list of lists. Let's say we want to make a matrix that looks like this:
1 | 3 | 5 | 7 |
2 | 3 | 4 | 5 |
5 | 2 | 20 | 3 |
In Python, we would create it as follows:
my_matrix = []
# Fill out the 0th row
my_matrix.append([1, 3, 5, 7])
# Fill out the 1st row
my_matrix.append([2, 3, 4, 5])
# Fill out the 2nd row
my_matrix.append([5, 2, 20, 3])
print(my_matrix)
However, when we print the matrix, it is not easy to read, because it is shown as a list of lists. So it is helpful to define a function print_matrix()
that prints a matrix in a nicer form:
def print_matrix(mat):
# Loop over all rows
for i in range(0, len(mat)):
print("[", end = "")
# Loop over each column in row i
for j in range(0, len(mat[i])):
# Print out the value in row i, column j
print(mat[i][j], end = "")
# Only add a tab if we're not in the last column
if j != len(mat[i]) - 1:
print("\t", end = "")
print("]\n")
print_matrix(my_matrix)
To retrieve an indiviual value from a matrix, we use code of the form my_matrix[row][column]
. So, to retrieve for example the value in the 0th column of the 2nd row, we write:
print("The value in the 2nd row and the 0th column is:", my_matrix[2][0])
It will also be helpful to have a function that can create a matrix of zeros.
# A function for making a matrix of zeroes
def zeros(rows, cols):
return [[0 for i in range(cols)] for j in range(rows)]
zero_matrix = zeros(3, 5) # make zero matrix with 3 rows and 5 columns
print_matrix(zero_matrix)
Problem 1:
Write a function that takes two sequences as input and returns a matrix of Needleman-Wunsch scores. You do not have to do the back-tracing, just fill out the matrix.
Break the problem down into as many small steps as possible. Here are a few hints:
zeros()
function defined below.max()
function will return the maximum value from a list of values. For example max(1, 7, 3)
will return 7
.range()
function.print_matrix()
function to print out your matrix as frequently as possible. Always make sure that your code is doing what you think it's doing!# Use these values to calculate scores
gap_penalty = -1
match_award = 1
mismatch_penalty = -1
# Make a score matrix with these two sequences
seq1 = "ATTACA"
seq2 = "ATGCT"
# A function for determining the score between any two bases in alignment
def match_score(alpha, beta):
if alpha == beta:
return match_award
elif alpha == '-' or beta == '-':
return gap_penalty
else:
return mismatch_penalty
# The function that actually fills out the matrix of scores
def needleman_wunsch(seq1, seq2):
# lengths of the two sequences
n = len(seq1)
m = len(seq2)
# generate matrix of zeros to store scores
score = zeros(m+1, n+1)
########################
# Your code starts here
########################
# Use the following steps as a guide to calculate the score matrix
# 1. Fill out first column
# 2. Fill out first row
# 3. Fill out all other values in the score matrix
# Return the final score matrix
return score
# Test out the needleman_wunsch() function
print_matrix(needleman_wunsch(seq1, seq2))
Problem 2:
Modify your code from Problem 1 to back-trace through the score matrix and print out the final alignment. Hint: For the back-tracing, you'll want to use a while
loop (or several of them).
def nw_backtrace(seq1, seq2):
# Calculate score table using the function we wrote earlier
score = needleman_wunsch(seq1, seq2)
# Create variables to store alignment
align1 = ""
align2 = ""
########################
# Your code starts here
########################
# Back-trace from bottom right of the score matrix
# and compute the alignment
return(align1, align2)
output1, output2 = nw_backtrace(seq1, seq2)
print(output1 + "\n" + output2)