{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lab Worksheet 13\n",
"\n",
"For this assignment, your goal is to implement the Needleman-Wunsch algorithm in Python. You can read more about the Needleman-Wunsch algorithm on [Wikipedia](https://en.wikipedia.org/wiki/Needlemanâ€“Wunsch_algorithm). The Wikipedia page contains psuedo-code which you might find helpful.\n",
"\n",
"## Part 1\n",
"\n",
"Write a function that takes two sequences as input, and returns a matrix of scores as we saw in Class 25. You **do not** have to do the back-tracing, just fill out the matrix. \n",
"\n",
"To get you started, a matrix can be represented in Python as a list of lists. Let's say we want to make a matrix that looks like this:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1\t3\t5\t7]\n",
"\n",
"[2\t3\t4\t5]\n",
"\n",
"[5\t2\t20\t3]\n",
"\n",
"The value in the 2nd row and the 0th column is: 5\n"
]
}
],
"source": [
"# Here's how to make the matrix above from a list of lists\n",
"my_matrix = []\n",
"# Fill out the 0th row\n",
"my_matrix.append([1, 3, 5, 7])\n",
"# Fill out the 1st row\n",
"my_matrix.append([2, 3, 4, 5])\n",
"# Fill out the 2nd row\n",
"my_matrix.append([5, 2, 20, 3])\n",
"\n",
"# Here is a helper function to print out matrices\n",
"def print_matrix(mat):\n",
" # Loop over all rows\n",
" for i in range(0, len(mat)):\n",
" print(\"[\", end = \"\")\n",
" # Loop over each column in row i\n",
" for j in range(0, len(mat[i])):\n",
" # Print out the value in row i, column j\n",
" print(mat[i][j], end = \"\")\n",
" # Only add a tab if we're not in the last column\n",
" if j != len(mat[i]) - 1:\n",
" print(\"\\t\", end = \"\")\n",
" print(\"]\\n\")\n",
"\n",
"print_matrix(my_matrix)\n",
"\n",
"# To retrieve the value from the 2nd row, in the 0th column, is relatively simple:\n",
"print(\"The value in the 2nd row and the 0th column is:\", my_matrix[2][0])\n",
"\n",
"# The format is always my_matrix[row][column]. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part 1 Hints\n",
"\n",
"Break the problem down into as many small steps as possible. Here are a few hints:\n",
"\n",
"* Before you calculate any scores, make an empty matrix of the appropriate size using the `zeros()` function defined below.\n",
"* Fill out the 0th row and 0th column before you calculate any other scores.\n",
"* The `max()` function will return the maximum value from a list of values. For example `max(1,7,3)` will return `7`.\n",
"* Make liberal use of the `range()` function.\n",
"* Use the `print_matrix()` function to print out your matrix as frequently as possible. Always make sure that your code is doing what you think it's doing!\n",
"* Remember, in Python, we start counting from 0.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0\t0\t0\t0\t0\t0\t0]\n",
"\n",
"[0\t0\t0\t0\t0\t0\t0]\n",
"\n",
"[0\t0\t0\t0\t0\t0\t0]\n",
"\n",
"[0\t0\t0\t0\t0\t0\t0]\n",
"\n",
"[0\t0\t0\t0\t0\t0\t0]\n",
"\n",
"[0\t0\t0\t0\t0\t0\t0]\n",
"\n"
]
}
],
"source": [
"# Use these values to calculate scores\n",
"gap_penalty = -1\n",
"match_award = 1\n",
"mismatch_penalty = -1\n",
"\n",
"# Make a score matrix with these two sequences\n",
"seq1 = \"ATTACA\"\n",
"seq2 = \"ATGCT\"\n",
"\n",
"# A function for making a matrix of zeroes\n",
"def zeros(rows, cols):\n",
" # Define an empty list\n",
" retval = []\n",
" # Set up the rows of the matrix\n",
" for x in range(rows):\n",
" # For each row, add an empty list\n",
" retval.append([])\n",
" # Set up the columns in each row\n",
" for y in range(cols):\n",
" # Add a zero to each column in each row\n",
" retval[-1].append(0)\n",
" # Return the matrix of zeros\n",
" return retval\n",
"\n",
"# A function for determining the score between any two bases in alignment\n",
"def match_score(alpha, beta):\n",
" if alpha == beta:\n",
" return match_award\n",
" elif alpha == '-' or beta == '-':\n",
" return gap_penalty\n",
" else:\n",
" return mismatch_penalty\n",
"\n",
"# The function that actually fills out a matrix of scores\n",
"def needleman_wunsch(seq1, seq2):\n",
" \n",
" # length of two sequences\n",
" n = len(seq1) \n",
" m = len(seq2)\n",
" \n",
" # Generate matrix of zeros to store scores\n",
" score = zeros(m+1, n+1)\n",
" \n",
" ########################\n",
" # Your code starts here\n",
" ########################\n",
" \n",
" # Use the following steps as a guide to calculate the score matrix\n",
" \n",
" # 1. Fill out first column\n",
" \n",
" # 2. Fill out first row\n",
" \n",
" # 3. Fill out all other values in the score matrix\n",
"\n",
" # Return the final score matrix\n",
" return score\n",
"\n",
"# Test out the needleman_wunsch() function\n",
"print_matrix(needleman_wunsch(seq1, seq2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 2: If that was easy...\n",
"\n",
"Modify your code from Part 1 to back-trace through the score matrix and print out the final alignment. **HINT:** For the back-tracing, you'll want to use a `while` loop (or several of them)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n"
]
}
],
"source": [
"def needleman_wunsch(seq1, seq2):\n",
" \n",
" # Store length of two sequences\n",
" n = len(seq1) \n",
" m = len(seq2)\n",
" \n",
" # Generate matrix of zeros to store scores\n",
" score = zeros(m+1, n+1)\n",
" \n",
" # Calculate score table\n",
" \n",
" # Create variables to store alignment\n",
" align1 = \"\"\n",
" align2 = \"\"\n",
" \n",
" # Back-trace from bottom right of the score matrix and compute the alignment \n",
" \n",
" return(align1, align2)\n",
"\n",
"output1, output2 = needleman_wunsch(seq1, seq2)\n",
"\n",
"print(output1 + \"\\n\" + output2)\n",
" \n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}