Lab Worksheet

Problem 1: Write a function that searches the NCBI Nucleotide Database for any given search term. Your function should require the following parameters: an email, a search term, and the max number of results to return. The function should return a list of GenBank accession numbers.

In [1]:
# you will need Entrez and SeqIO
from Bio import Entrez, SeqIO
In [2]:
def get_accessions(email, search_term, max_return):

    Entrez.email = email
    # your code here

Problem 2: Write a function that takes a list of GenBank identifiers (obtained from your function above) and fetches the records associated with those IDs from the NCBI Nucleotide database. Your function should require the following parameters: an email, a list of GenBank accessions, and the name of an output file to hold the records. The function should write a file containing all of the data from each GenBank record given to the function.

In [3]:
def write_gb_file(email, accession_list, outfile_name):

    Entrez.email = email
    # your code here

Problem 3: Write a function that takes a GenBank file and computes the tRNA count for each record in that file. The code in your function should find the "tRNA" type in record.features and count the number of occurrences for each record. Your function should require one parameter: a file containing one or more GenBank records.

In [4]:
def get_tRNA_counts(gb_file):
    
    with open (gb_file, 'r') as infile:
        
        # use `SeqIO.parse` to read a file with many GenBank records
        all_records = SeqIO.parse(infile, "genbank")
        count_dict = {} # dictionary to hold the tRNA counts for each GenBank accession
        
        # your code here