Enter your name and EID here
This homework is due on April 10, 2018 at 7:00pm. Please submit as a PDF file on Canvas. Before submission, please re-run all cells by clicking "Kernel" and selecting "Restart & Run All."
Problem 1 (2 points): Using Biopython and the Pubmed database, calculate the average number of papers that Dr. Wilke has published from 2013-2017 (inclusive, so that's 5 years total).
Hints: Dr. Wilke will always appear as "Wilke CO" in the Pubmed database. Also, make sure to set the
retmax argument to at least
Entrez.esearch() so that you retrieve all of the papers.
# You will need Entrez and Medline to solve this problem from Bio import Entrez, Medline Entrez.email = "your email goes here" # Your code goes here
Problem 2 (4 points): From the years 2013-2017 (inclusive), in which journals did Dr. Wilke's papers appear and how many times in each journal did his papers appear? Print out each journal and the number of times a paper appeared in that journal. Make sure you don't print the same journal name twice.
Hint: In class 21, we parsed the results of a literature search with
Medline.parse(). This allows us to look at the references we found and to retrieve different parts of the reference with a key. For example, to retrieve the abstract, we would write
record['AB']. You can find a list of possible keys here.
# Your code goes here
Problem 3 (4 points): From 2013-2017 (inclusive), how many of Dr. Wilke's papers contain the terms "virus" or "viral" in the title? Use python and regular expressions to find an answer.
Hint: In a regular expression, you can match the same word with slightly different endings using the "
|" (or) operator. For example, the regex "bacteri(a|um)" would match both "bacteria" and "bacterium".
# You'll need the module re for regular expressions import re # Your code goes here