Homework 9

Enter your name and EID here

This homework is due on April 10, 2018 at 7:00pm. Please submit as a PDF file on Canvas. Before submission, please re-run all cells by clicking "Kernel" and selecting "Restart & Run All."

Problem 1 (2 points): Using Biopython and the Pubmed database, calculate the average number of papers that Dr. Wilke has published from 2013-2017 (inclusive, so that's 5 years total).

Hints: Dr. Wilke will always appear as "Wilke CO" in the Pubmed database. Also, make sure to set the retmax argument to at least 60 in Entrez.esearch() so that you retrieve all of the papers.

In [1]:
# You will need Entrez and Medline to solve this problem
from Bio import Entrez, Medline

Entrez.email = "your email goes here"

# Your code goes here

Problem 2 (4 points): From the years 2013-2017 (inclusive), in which journals did Dr. Wilke's papers appear and how many times in each journal did his papers appear? Print out each journal and the number of times a paper appeared in that journal. Make sure you don't print the same journal name twice.

Hint: In class 21, we parsed the results of a literature search with Medline.parse(). This allows us to look at the references we found and to retrieve different parts of the reference with a key. For example, to retrieve the abstract, we would write record['AB']. You can find a list of possible keys here.

In [2]:
# Your code goes here

Problem 3 (4 points): From 2013-2017 (inclusive), how many of Dr. Wilke's papers contain the terms "virus" or "viral" in the title? Use python and regular expressions to find an answer.

Hint: In a regular expression, you can match the same word with slightly different endings using the "|" (or) operator. For example, the regex "bacteri(a|um)" would match both "bacteria" and "bacterium".

In [3]:
# You'll need the module re for regular expressions
import re

# Your code goes here