Enter your name and EID here
After completing this Jupyter notebook, please convert it to pdf and submit both the pdf and the original notebook on Canvas no later than 12:00 pm on May 7, 2020. The two documents will be graded jointly, so they must be consistent (as in, don't change the Jupyter notebook without also updating the pdf!).
All results presented must have corresponding code. Any answers/results given without the generative Python code will be considered absent. All code reported in your final project document should work properly.
Before submitting the Jupyter notebook part, please re-run all cells by clicking "Kernel" and selecting "Restart & Run All."
The project consists of two problems. For both problems, please follow these guidelines:
(50 pts) Since the start of SARS-CoV-2 pandemic, the number of academic publications concerning the virus has increased significantly. Using Python, answer the following questions:
Hints: Set retmax = 10000
to ensure you get all of the publications for your searches. In your search terms, do not restrict the publication date; instead, download all records that match the search term and extract the publication date from the records. You should end up with two PMID lists, one for SARS-CoV-2 and one for H1N1. To get the date of publication, use record['DP']
. To extract the year of publication from the full date of publication record['DP']
, you'll need to use a regular expression. As an example, record['DP']
could be either 2017 Feb 12
or Winter 2017
; your code should match 2017
in both cases.
Approach: Provide a brief description (1-2 paragraphs) of your strategy for answering the above questions.
# you will need the following libraries to answer these questions
from Bio import Entrez, Medline
import re
# your code goes here
# your code goes here
# your code goes here
Discussion: Provide a brief conclusion (1-2 paragraphs) explaining what you have learned about this question from your code.
(50 pts)
Ask one bioinformatics question using the resources we've discussed in class (e.g., a literature search or a genomic query for your favorite organism). You may use the literature search you performed above, but query something different about the publication list. Then, write Python code to answer your question.
For full credit, the answer code must meet the following conditions:
for
loopif
statementQuestion: Your question goes here.
Approach: Provide a brief description (1-2 paragraphs) of your strategy for answering the above question.
# your code goes here
# your code goes here
# your code goes here
Discussion: Provide a brief conclusion (1-2 paragraphs) explaining what you have learned about your question from your code.