{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab Worksheet 13\n", "\n", "For this assignment, your goal is to implement the Needleman-Wunsch algorithm in Python. You can read more about the Needleman-Wunsch algorithm on [Wikipedia](https://en.wikipedia.org/wiki/Needleman–Wunsch_algorithm). The Wikipedia page contains psuedo-code which you might find helpful.\n", "\n", "## Part 1\n", "\n", "Write a function that takes two sequences as input, and returns a matrix of scores as we saw in Class 25. You **do not** have to do the back-tracing, just fill out the matrix. \n", "\n", "To get you started, a matrix can be represented in Python as a list of lists. Let's say we want to make a matrix that looks like this:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
 1 3 5 7 2 3 4 5 5 2 20 3
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1\t3\t5\t7]\n", "\n", "[2\t3\t4\t5]\n", "\n", "[5\t2\t20\t3]\n", "\n", "The value in the 2nd row and the 0th column is: 5\n" ] } ], "source": [ "# Here's how to make the matrix above from a list of lists\n", "my_matrix = []\n", "# Fill out the 0th row\n", "my_matrix.append([1, 3, 5, 7])\n", "# Fill out the 1st row\n", "my_matrix.append([2, 3, 4, 5])\n", "# Fill out the 2nd row\n", "my_matrix.append([5, 2, 20, 3])\n", "\n", "# Here is a helper function to print out matrices\n", "def print_matrix(mat):\n", " # Loop over all rows\n", " for i in range(0, len(mat)):\n", " print(\"[\", end = \"\")\n", " # Loop over each column in row i\n", " for j in range(0, len(mat[i])):\n", " # Print out the value in row i, column j\n", " print(mat[i][j], end = \"\")\n", " # Only add a tab if we're not in the last column\n", " if j != len(mat[i]) - 1:\n", " print(\"\\t\", end = \"\")\n", " print(\"]\\n\")\n", "\n", "print_matrix(my_matrix)\n", "\n", "# To retrieve the value from the 2nd row, in the 0th column, is relatively simple:\n", "print(\"The value in the 2nd row and the 0th column is:\", my_matrix)\n", "\n", "# The format is always my_matrix[row][column]. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part 1 Hints\n", "\n", "Break the problem down into as many small steps as possible. Here are a few hints:\n", "\n", "* Before you calculate any scores, make an empty matrix of the appropriate size using the `zeros()` function defined below.\n", "* Fill out the 0th row and 0th column before you calculate any other scores.\n", "* The `max()` function will return the maximum value from a list of values. For example `max(1,7,3)` will return `7`.\n", "* Make liberal use of the `range()` function.\n", "* Use the `print_matrix()` function to print out your matrix as frequently as possible. Always make sure that your code is doing what you think it's doing!\n", "* Remember, in Python, we start counting from 0.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0\t0\t0\t0\t0\t0\t0]\n", "\n", "[0\t0\t0\t0\t0\t0\t0]\n", "\n", "[0\t0\t0\t0\t0\t0\t0]\n", "\n", "[0\t0\t0\t0\t0\t0\t0]\n", "\n", "[0\t0\t0\t0\t0\t0\t0]\n", "\n", "[0\t0\t0\t0\t0\t0\t0]\n", "\n" ] } ], "source": [ "# Use these values to calculate scores\n", "gap_penalty = -1\n", "match_award = 1\n", "mismatch_penalty = -1\n", "\n", "# Make a score matrix with these two sequences\n", "seq1 = \"ATTACA\"\n", "seq2 = \"ATGCT\"\n", "\n", "# A function for making a matrix of zeroes\n", "def zeros(rows, cols):\n", " # Define an empty list\n", " retval = []\n", " # Set up the rows of the matrix\n", " for x in range(rows):\n", " # For each row, add an empty list\n", " retval.append([])\n", " # Set up the columns in each row\n", " for y in range(cols):\n", " # Add a zero to each column in each row\n", " retval[-1].append(0)\n", " # Return the matrix of zeros\n", " return retval\n", "\n", "# A function for determining the score between any two bases in alignment\n", "def match_score(alpha, beta):\n", " if alpha == beta:\n", " return match_award\n", " elif alpha == '-' or beta == '-':\n", " return gap_penalty\n", " else:\n", " return mismatch_penalty\n", "\n", "# The function that actually fills out a matrix of scores\n", "def needleman_wunsch(seq1, seq2):\n", " \n", " # length of two sequences\n", " n = len(seq1) \n", " m = len(seq2)\n", " \n", " # Generate matrix of zeros to store scores\n", " score = zeros(m+1, n+1)\n", " \n", " ########################\n", " # Your code starts here\n", " ########################\n", " \n", " # Use the following steps as a guide to calculate the score matrix\n", " \n", " # 1. Fill out first column\n", " \n", " # 2. Fill out first row\n", " \n", " # 3. Fill out all other values in the score matrix\n", "\n", " # Return the final score matrix\n", " return score\n", "\n", "# Test out the needleman_wunsch() function\n", "print_matrix(needleman_wunsch(seq1, seq2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: If that was easy...\n", "\n", "Modify your code from Part 1 to back-trace through the score matrix and print out the final alignment. **HINT:** For the back-tracing, you'll want to use a `while` loop (or several of them)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n" ] } ], "source": [ "def needleman_wunsch(seq1, seq2):\n", " \n", " # Store length of two sequences\n", " n = len(seq1) \n", " m = len(seq2)\n", " \n", " # Generate matrix of zeros to store scores\n", " score = zeros(m+1, n+1)\n", " \n", " # Calculate score table\n", " \n", " # Create variables to store alignment\n", " align1 = \"\"\n", " align2 = \"\"\n", " \n", " # Back-trace from bottom right of the score matrix and compute the alignment \n", " \n", " return(align1, align2)\n", "\n", "output1, output2 = needleman_wunsch(seq1, seq2)\n", "\n", "print(output1 + \"\\n\" + output2)\n", " \n", " \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 0 }