Enter your name and EID here
This homework is due on April 16, 2019 at 4:00pm. Please submit as a PDF file on Canvas. Before submission, please re-run all cells by clicking "Kernel" and selecting "Restart & Run All."
Problem 1 (2 points): Using Biopython and the Pubmed database, calculate the average number of papers per year that Dr. Wilke has published from 2015-2019 (inclusive, so that's 5 years total).
Hints: Dr. Wilke will always appear as "Wilke CO" in the Pubmed database. Also, make sure to set the retmax
argument to at least 50
in Entrez.esearch()
so that you retrieve all of the papers.
# You will need Entrez and Medline to solve this problem
from Bio import Entrez, Medline
Entrez.email = "your email goes here"
# Your code goes here
Problem 2 (4 points): From the years 2015-2019 (inclusive), how many different co-authors did Dr. Wilke publish with and how many times did Dr. Wilke publish a paper with each co-author? Print out each co-author and the number of times Dr. Wilke published a paper with that co-author. Make sure you don't print the same co-author's name twice.
Hint: In class 21, we parsed the results of a literature search with Medline.parse()
. This allows us to look at the references we found and to retrieve different parts of the reference with a key. For example, to retrieve the abstract, we would write record['AB']
. You can find a list of possible keys here.
# Your code goes here
Problem 3 (4 points): From 2015-2019 (inclusive), how many of Dr. Wilke's papers contain the terms "evolution" or "evolutionary" in the abstract? Use python and regular expressions to find an answer.
Hint: In a regular expression, you can match the same word with slightly different endings using the "|
" (or) operator. For example, the regex "bacteri(a|um)" would match both "bacteria" and "bacterium".
# You'll need the module re for regular expressions
import re
# Your code goes here