Last week, we talked about how to use Entrez to access genomic information from the NCBI database. This week, we're focusing on how to use Entrez and Medline to search the PubMed (literature) database. For this module, a list of important abbreviations and their meanings can be found here: https://www.nlm.nih.gov/bsd/mms/medlineelements.html
Problem 1:
(a) Download the Medline record for the publication with pubmed id 32191846 and parse it with the Medline.parse()
function. Then print a list of all key-value pairs returned in that record.
(b) Use an Entrez esearch query of the pubmed database to find out how many publications "Marcotte EM" wrote in 2020.
(c) From the results of part (b), compile a dictionary of all the publication titles and abstracts for "Marcotte EM" in 2020. Print each publication title, followed by that paper's abstract.
# Problem 1a
from Bio import Entrez, Medline
Entrez.email = "rachaelcox@utexas.edu"
handle = Entrez.efetch(db="pubmed", id='32191846', rettype="medline", retmode="text")
records = Medline.parse(handle) ## Hint
record = list(records)[0] ## Hint
handle.close()
for key in record.keys():
print(key + ":", record[key])
# Problem 1b
from Bio import Entrez
Entrez.email = "rachaelcox@utexas.edu"
handle = Entrez.esearch(db="pubmed", # database to search
term="Marcotte EM[Author] AND 2020[Date - Publication]", # search term
retmax=10 # number of results that are returned
)
record = Entrez.read(handle)
handle.close()
# search returns PubMed IDs (pmids)
pmid_list = record["IdList"]
print("Publications found:", pmid_list)
print("Number of publications:", len(pmid_list))
# Problem 1c
from Bio import Medline
Entrez.email = "rachaelcox@utexas.edu"
handle = Entrez.efetch(db="pubmed", id=pmid_list, rettype="medline", retmode="text")
records = Medline.parse(handle)
lit_dict = {} # start with empty list of paper titles
for record in records:
title = record['TI']
abstract = record['AB']
lit_dict[title] = abstract
handle.close()
print('Publication information for "Marcotte EM" in 2020:\n')
for title in lit_dict:
print('\033[1m' + title) # print title in bold with '\033[1m'
print('\033[0m' + lit_dict[title]) # switch back to regular font with '\033[0m'
print()
Problem 4: From the results of part (b), compile a dictionary with each publication title and its associated author list (AU), source (SO), and abstract (AB) for "Marcotte EM" in 2020. Print each publication title, followed by that paper's author list, then source, then abstract.
from Bio import Medline
handle = Entrez.efetch(db="pubmed", id=pmid_list, rettype="medline", retmode="text")
records = Medline.parse(handle)
lit_dict = {} # start with empty list of paper titles
for record in records:
info = []
title = record['TI']
info.append(record['AU'])
info.append(record['SO'])
info.append(record['AB'])
lit_dict[title] = info
handle.close()
print('Publication information for "Marcotte EM" in 2020:\n')
for title in lit_dict:
print('\033[1m') # switch to bold fond for title
print(title) # print title
print('\033[0m', end = '') # switch back to regular font
print(*lit_dict[title][0], sep = ', ')
print(lit_dict[title][1])
print(lit_dict[title][2])