Lab Worksheet 11 Solutions

Last week, we talked about how to use Entrez to access genomic information from the NCBI database. This week, we're focusing on how to use Entrez and Medline to search the PubMed (literature) database. For this module, a list of important abbreviations and their meanings can be found here: https://www.nlm.nih.gov/bsd/mms/medlineelements.html

Problem 1:

(a) Download the Medline record for the publication with pubmed id 32191846 and parse it with the Medline.parse() function. Then print a list of all key-value pairs returned in that record.

(b) Use an Entrez esearch query of the pubmed database to find out how many publications "Marcotte EM" wrote in 2020.

(c) From the results of part (b), compile a dictionary of all the publication titles and abstracts for "Marcotte EM" in 2020. Print each publication title, followed by that paper's abstract.

In [1]:
# Problem 1a

from Bio import Entrez, Medline
Entrez.email = "rachaelcox@utexas.edu"

handle = Entrez.efetch(db="pubmed", id='32191846', rettype="medline", retmode="text")
records = Medline.parse(handle) ## Hint
record = list(records)[0] ## Hint
handle.close()

for key in record.keys():
    print(key + ":", record[key])
PMID: 32191846
OWN: NLM
STAT: Publisher
LR: 20200319
IS: 1097-4172 (Electronic) 0092-8674 (Linking)
DP: 2020 Mar 16
TI: A Pan-plant Protein Complex Map Reveals Deep Conservation and Novel Assemblies.
LID: S0092-8674(20)30226-9 [pii] 10.1016/j.cell.2020.02.049 [doi]
AB: Plants are foundational for global ecological and economic systems, but most plant proteins remain uncharacterized. Protein interaction networks often suggest protein functions and open new avenues to characterize genes and proteins. We therefore systematically determined protein complexes from 13 plant species of scientific and agricultural importance, greatly expanding the known repertoire of stable protein complexes in plants. By using co-fractionation mass spectrometry, we recovered known complexes, confirmed complexes predicted to occur in plants, and identified previously unknown interactions conserved over 1.1 billion years of green plant evolution. Several novel complexes are involved in vernalization and pathogen defense, traits critical for agriculture. We also observed plant analogs of animal complexes with distinct molecular assemblies, including a megadalton-scale tRNA multi-synthetase complex. The resulting map offers a cross-species view of conserved, stable protein assemblies shared across plant cells and provides a mechanistic, biochemical framework for interpreting plant genetics and mutant phenotypes.
CI: ['Copyright (c) 2020 Elsevier Inc. All rights reserved.']
FAU: ['McWhite, Claire D', 'Papoulas, Ophelia', 'Drew, Kevin', 'Cox, Rachael M', 'June, Viviana', 'Dong, Oliver Xiaoou', 'Kwon, Taejoon', 'Wan, Cuihong', 'Salmi, Mari L', 'Roux, Stanley J', 'Browning, Karen S', 'Chen, Z Jeffrey', 'Ronald, Pamela C', 'Marcotte, Edward M']
AU: ['McWhite CD', 'Papoulas O', 'Drew K', 'Cox RM', 'June V', 'Dong OX', 'Kwon T', 'Wan C', 'Salmi ML', 'Roux SJ', 'Browning KS', 'Chen ZJ', 'Ronald PC', 'Marcotte EM']
AD: Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA. Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA. Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA. Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA. Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA. Department of Plant Pathology and The Genome Center, University of California, Davis, Davis, CA 95616, USA; Joint Bioenergy Institute, Emeryville, CA 94608, USA. Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulju-gun, Ulsan 44919, Republic of Korea. Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA; Hubei Key Lab of Genetic Regulation and Integrative Biology, School of Life Sciences, Central China Normal University, No. 152 Luoyu Road, Wuhan 430079, P.R. China. Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA. Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA. Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA. Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA. Department of Plant Pathology and The Genome Center, University of California, Davis, Davis, CA 95616, USA; Joint Bioenergy Institute, Emeryville, CA 94608, USA. Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA. Electronic address: marcotte@icmb.utexas.edu.
LA: ['eng']
PT: ['Journal Article']
DEP: 20200316
PL: United States
TA: Cell
JT: Cell
JID: 0413066
SB: IM
OTO: ['NOTNLM']
OT: ['co-fractionation mass spectrometry (CF-MS)', 'comparative proteomics', 'cross-linking mass spectrometry (CL-MS)', 'evolution', 'interaction-to-phenotype', 'pathogen defense', 'plants', 'protein complexes', 'protein interactions']
COIS: ['Declaration of Interests The authors declare no competing interests.']
EDAT: 2020/03/20 06:00
MHDA: 2020/03/20 06:00
CRDT: ['2020/03/20 06:00']
PHST: ['2019/10/15 00:00 [received]', '2020/01/08 00:00 [revised]', '2020/02/21 00:00 [accepted]', '2020/03/20 06:00 [entrez]', '2020/03/20 06:00 [pubmed]', '2020/03/20 06:00 [medline]']
AID: ['S0092-8674(20)30226-9 [pii]', '10.1016/j.cell.2020.02.049 [doi]']
PST: aheadofprint
SO: Cell. 2020 Mar 16. pii: S0092-8674(20)30226-9. doi: 10.1016/j.cell.2020.02.049.
In [2]:
# Problem 1b

from Bio import Entrez
Entrez.email = "rachaelcox@utexas.edu"

handle = Entrez.esearch(db="pubmed",  # database to search
                        term="Marcotte EM[Author] AND 2020[Date - Publication]",  # search term
                        retmax=10  # number of results that are returned
                        )
record = Entrez.read(handle)
handle.close()

# search returns PubMed IDs (pmids)
pmid_list = record["IdList"]
print("Publications found:", pmid_list)
print("Number of publications:", len(pmid_list))
Publications found: ['32191846', '32129706', '32129623', '31825225', '31726096', '31416630']
Number of publications: 6
In [3]:
# Problem 1c

from Bio import Medline
Entrez.email = "rachaelcox@utexas.edu"

handle = Entrez.efetch(db="pubmed", id=pmid_list, rettype="medline", retmode="text")
records = Medline.parse(handle)

lit_dict = {} # start with empty list of paper titles
for record in records:
    title = record['TI']
    abstract = record['AB']
    lit_dict[title] = abstract

handle.close()
print('Publication information for "Marcotte EM" in 2020:\n')
for title in lit_dict:
    print('\033[1m' + title) # print title in bold with '\033[1m'
    print('\033[0m' + lit_dict[title]) # switch back to regular font with '\033[0m'
    print()
Publication information for "Marcotte EM" in 2020:

A Pan-plant Protein Complex Map Reveals Deep Conservation and Novel Assemblies.
Plants are foundational for global ecological and economic systems, but most plant proteins remain uncharacterized. Protein interaction networks often suggest protein functions and open new avenues to characterize genes and proteins. We therefore systematically determined protein complexes from 13 plant species of scientific and agricultural importance, greatly expanding the known repertoire of stable protein complexes in plants. By using co-fractionation mass spectrometry, we recovered known complexes, confirmed complexes predicted to occur in plants, and identified previously unknown interactions conserved over 1.1 billion years of green plant evolution. Several novel complexes are involved in vernalization and pathogen defense, traits critical for agriculture. We also observed plant analogs of animal complexes with distinct molecular assemblies, including a megadalton-scale tRNA multi-synthetase complex. The resulting map offers a cross-species view of conserved, stable protein assemblies shared across plant cells and provides a mechanistic, biochemical framework for interpreting plant genetics and mutant phenotypes.

Abundances of transcripts, proteins, and metabolites in the cell cycle of budding yeast reveal coordinate control of lipid metabolism.
Establishing the pattern of abundance of molecules of interest during cell division has been a long-standing goal of cell cycle studies. Here, for the first time in any system, we present experiment-matched datasets of the levels of RNAs, proteins, metabolites, and lipids from un-arrested, growing, and synchronously dividing yeast cells. Overall, transcript and protein levels were correlated, but specific processes that appeared to change at the RNA level (e.g., ribosome biogenesis), did not do so at the protein level, and vice versa. We also found no significant changes in codon usage or the ribosome content during the cell cycle. We describe an unexpected mitotic peak in the abundance of ergosterol and thiamine biosynthesis enzymes. Although the levels of several metabolites changed in the cell cycle, by far the most significant changes were in the lipid repertoire, with phospholipids and triglycerides peaking strongly late in the cell cycle. Our findings provide an integrated view of the abundance of biomolecules in the eukaryotic cell cycle and point to a coordinate mitotic control of lipid metabolism.

Structural Biology in the Multi-Omics Era.
Rapid developments in cryogenic electron microscopy have opened new avenues to probe the structures of protein assemblies in their near native states. Recent studies have begun applying single -particle analysis to heterogeneous mixtures, revealing the potential of structural-omics approaches that combine the power of mass spectrometry and electron microscopy. Here we highlight advances and challenges in sample preparation, data processing, and molecular modeling for handling increasingly complex mixtures. Such advances will help structural-omics methods extend to cellular-level models of structural biology.

Synthesis of Carboxy ATTO 647N Using Redox Cycling for Xanthone Access.
A synthesis of the carbopyronine dye Carboxy ATTO 647N from simple materials is reported. This route proceeds in 11 forward steps from 3-bromoaniline with the key xanthone intermediate formed using a new oxidation methodology. The step utilizes an oxidation cycle with base, water, iodine, and more than doubles the yield of the standard permanganate oxidation methodology, accessing gram-scale quantities of this late-stage product. From this, Carboxy ATTO 647N was prepared in only four additional steps. This facile route to a complex fluorophore is expected to enable further studies in fluorescence imaging.

Separating distinct structures of multiple macromolecular assemblies from cryo-EM projections.
Single particle analysis for structure determination in cryo-electron microscopy is traditionally applied to samples purified to near homogeneity as current reconstruction algorithms are not designed to handle heterogeneous mixtures of structures from many distinct macromolecular complexes. We extend on long established methods and demonstrate that relating two-dimensional projection images by their common lines in a graphical framework is sufficient for partitioning distinct protein and multiprotein complexes within the same data set. The feasibility of this approach is first demonstrated on a large set of synthetic reprojections from 35 unique macromolecular structures spanning a mass range of hundreds to thousands of kilodaltons. We then apply our algorithm on cryo-EM data collected from a mixture of five protein complexes and use existing methods to solve multiple three-dimensional structures ab initio. Incorporating methods to sort single particle cryo-EM data from extremely heterogeneous mixtures will alleviate the need for stringent purification and pave the way toward investigation of samples containing many unique structures.

Bringing Microscopy-By-Sequencing into View.
The spatial distribution of molecules and cells is fundamental to understanding biological systems. Traditionally, microscopies based on electromagnetic waves such as visible light have been used to localize cellular components by direct visualization. However, these techniques suffer from limitations of transmissibility and throughput. Complementary to optical approaches, biochemical techniques such as crosslinking can colocalize molecules without suffering the same limitations. However, biochemical approaches are often unable to combine individual colocalizations into a map across entire cells or tissues. Microscopy-by-sequencing techniques aim to biochemically colocalize DNA-barcoded molecules and, by tracking their thus unique identities, reconcile all colocalizations into a global spatial map. Here, we review this new field and discuss its enormous potential to answer a broad spectrum of questions.

If that was easy...

Problem 4: From the results of part (b), compile a dictionary with each publication title and its associated author list (AU), source (SO), and abstract (AB) for "Marcotte EM" in 2020. Print each publication title, followed by that paper's author list, then source, then abstract.

In [4]:
from Bio import Medline
handle = Entrez.efetch(db="pubmed", id=pmid_list, rettype="medline", retmode="text")
records = Medline.parse(handle)

lit_dict = {} # start with empty list of paper titles
for record in records:
    info = []
    title = record['TI']
    info.append(record['AU'])
    info.append(record['SO'])
    info.append(record['AB'])
    lit_dict[title] = info

handle.close()

print('Publication information for "Marcotte EM" in 2020:\n')
for title in lit_dict:
    print('\033[1m') # switch to bold fond for title 
    print(title) # print title
    print('\033[0m', end = '') # switch back to regular font
    print(*lit_dict[title][0], sep = ', ')
    print(lit_dict[title][1])
    print(lit_dict[title][2])
Publication information for "Marcotte EM" in 2020:


A Pan-plant Protein Complex Map Reveals Deep Conservation and Novel Assemblies.
McWhite CD, Papoulas O, Drew K, Cox RM, June V, Dong OX, Kwon T, Wan C, Salmi ML, Roux SJ, Browning KS, Chen ZJ, Ronald PC, Marcotte EM
Cell. 2020 Mar 16. pii: S0092-8674(20)30226-9. doi: 10.1016/j.cell.2020.02.049.
Plants are foundational for global ecological and economic systems, but most plant proteins remain uncharacterized. Protein interaction networks often suggest protein functions and open new avenues to characterize genes and proteins. We therefore systematically determined protein complexes from 13 plant species of scientific and agricultural importance, greatly expanding the known repertoire of stable protein complexes in plants. By using co-fractionation mass spectrometry, we recovered known complexes, confirmed complexes predicted to occur in plants, and identified previously unknown interactions conserved over 1.1 billion years of green plant evolution. Several novel complexes are involved in vernalization and pathogen defense, traits critical for agriculture. We also observed plant analogs of animal complexes with distinct molecular assemblies, including a megadalton-scale tRNA multi-synthetase complex. The resulting map offers a cross-species view of conserved, stable protein assemblies shared across plant cells and provides a mechanistic, biochemical framework for interpreting plant genetics and mutant phenotypes.

Abundances of transcripts, proteins, and metabolites in the cell cycle of budding yeast reveal coordinate control of lipid metabolism.
Blank HM, Papoulas O, Maitra N, Garge R, Kennedy BK, Schilling B, Marcotte EM, Polymenis M
Mol Biol Cell. 2020 Mar 4:mbcE19120708. doi: 10.1091/mbc.E19-12-0708.
Establishing the pattern of abundance of molecules of interest during cell division has been a long-standing goal of cell cycle studies. Here, for the first time in any system, we present experiment-matched datasets of the levels of RNAs, proteins, metabolites, and lipids from un-arrested, growing, and synchronously dividing yeast cells. Overall, transcript and protein levels were correlated, but specific processes that appeared to change at the RNA level (e.g., ribosome biogenesis), did not do so at the protein level, and vice versa. We also found no significant changes in codon usage or the ribosome content during the cell cycle. We describe an unexpected mitotic peak in the abundance of ergosterol and thiamine biosynthesis enzymes. Although the levels of several metabolites changed in the cell cycle, by far the most significant changes were in the lipid repertoire, with phospholipids and triglycerides peaking strongly late in the cell cycle. Our findings provide an integrated view of the abundance of biomolecules in the eukaryotic cell cycle and point to a coordinate mitotic control of lipid metabolism.

Structural Biology in the Multi-Omics Era.
McCafferty CL, Verbeke EJ, Marcotte EM, Taylor DW
J Chem Inf Model. 2020 Mar 10. doi: 10.1021/acs.jcim.9b01164.
Rapid developments in cryogenic electron microscopy have opened new avenues to probe the structures of protein assemblies in their near native states. Recent studies have begun applying single -particle analysis to heterogeneous mixtures, revealing the potential of structural-omics approaches that combine the power of mass spectrometry and electron microscopy. Here we highlight advances and challenges in sample preparation, data processing, and molecular modeling for handling increasingly complex mixtures. Such advances will help structural-omics methods extend to cellular-level models of structural biology.

Synthesis of Carboxy ATTO 647N Using Redox Cycling for Xanthone Access.
Bachman JL, Pavlich CI, Boley AJ, Marcotte EM, Anslyn EV
Org Lett. 2020 Jan 17;22(2):381-385. doi: 10.1021/acs.orglett.9b03981. Epub 2019 Dec 11.
A synthesis of the carbopyronine dye Carboxy ATTO 647N from simple materials is reported. This route proceeds in 11 forward steps from 3-bromoaniline with the key xanthone intermediate formed using a new oxidation methodology. The step utilizes an oxidation cycle with base, water, iodine, and more than doubles the yield of the standard permanganate oxidation methodology, accessing gram-scale quantities of this late-stage product. From this, Carboxy ATTO 647N was prepared in only four additional steps. This facile route to a complex fluorophore is expected to enable further studies in fluorescence imaging.

Separating distinct structures of multiple macromolecular assemblies from cryo-EM projections.
Verbeke EJ, Zhou Y, Horton AP, Mallam AL, Taylor DW, Marcotte EM
J Struct Biol. 2020 Jan 1;209(1):107416. doi: 10.1016/j.jsb.2019.107416. Epub 2019 Nov 11.
Single particle analysis for structure determination in cryo-electron microscopy is traditionally applied to samples purified to near homogeneity as current reconstruction algorithms are not designed to handle heterogeneous mixtures of structures from many distinct macromolecular complexes. We extend on long established methods and demonstrate that relating two-dimensional projection images by their common lines in a graphical framework is sufficient for partitioning distinct protein and multiprotein complexes within the same data set. The feasibility of this approach is first demonstrated on a large set of synthetic reprojections from 35 unique macromolecular structures spanning a mass range of hundreds to thousands of kilodaltons. We then apply our algorithm on cryo-EM data collected from a mixture of five protein complexes and use existing methods to solve multiple three-dimensional structures ab initio. Incorporating methods to sort single particle cryo-EM data from extremely heterogeneous mixtures will alleviate the need for stringent purification and pave the way toward investigation of samples containing many unique structures.

Bringing Microscopy-By-Sequencing into View.
Boulgakov AA, Ellington AD, Marcotte EM
Trends Biotechnol. 2020 Feb;38(2):154-162. doi: 10.1016/j.tibtech.2019.06.001. Epub 2019 Aug 12.
The spatial distribution of molecules and cells is fundamental to understanding biological systems. Traditionally, microscopies based on electromagnetic waves such as visible light have been used to localize cellular components by direct visualization. However, these techniques suffer from limitations of transmissibility and throughput. Complementary to optical approaches, biochemical techniques such as crosslinking can colocalize molecules without suffering the same limitations. However, biochemical approaches are often unable to combine individual colocalizations into a map across entire cells or tissues. Microscopy-by-sequencing techniques aim to biochemically colocalize DNA-barcoded molecules and, by tracking their thus unique identities, reconcile all colocalizations into a global spatial map. Here, we review this new field and discuss its enormous potential to answer a broad spectrum of questions.