Class 26: BLAST

April 19, 2018

The web interface to BLAST is available here: http://blast.ncbi.nlm.nih.gov/Blast.cgi

Let's search for proteins related to the following query sequence, which is the glycoprotein of Machupo virus (causative agent of Bolivian hemorrhagic fever):

>GI:45825963|Machupo virus glycoprotein
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCS
DGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVS
VLMKEYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSN
IQFNISKADESRVYGKKIRNGMRHLFRGFYDPCEEGKVCYVTINQCGDPSSFEYCGTN
YLSKCQFDHVNTLHFLVRSKTHLNF

We can download the blast results from the NCBI website in XML format and store them as Machupo_BLAST.xml. This file is available here.

Now we can process this file with Biopython.

In [1]:
from Bio.Blast import NCBIXML
from urllib.request import urlretrieve # to download xml file

# download file from course website and store locally
urlretrieve('http://wilkelab.org/classes/SDS348/data_sets/Machupo_BLAST.xml', 'Machupo_BLAST.xml')

# open the downloaded file and parse with NCBIXML.read()
blast_handle = open("Machupo_BLAST.xml")
blast_record = NCBIXML.read(blast_handle)
blast_handle.close()

imax = 30 # process the first 30 alignments
i = 0
for alignment in blast_record.alignments:
    i += 1
    if i > imax:
        break
    # we need a for loop here because in theory we could have
    # more than one hsp (High-scoring Segment Pair) per alignment
    for hsp in alignment.hsps:
        print('\n****Alignment****')
        print('sequence ID:', alignment.title)
        print('length:', alignment.length)
        print('score:', hsp.score)
        print('e value:', hsp.expect)
        print("Query:", hsp.query[0:100] + '...')
        print("Match:", hsp.match[0:100] + '...')
        print("  Hit:", hsp.sbjct[0:100] + '...')
****Alignment****
sequence ID: gi|45825964|gb|AAS77647.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1381.0
e value: 0.0
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45826506|gb|AAS77879.1| glycoprotein precursor [Machupo mammarenavirus]
length: 496
score: 1379.0
e value: 0.0
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45825936|gb|AAS77633.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1274.0
e value: 4.8109e-175
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQL+SFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLVSFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45825934|gb|AAS77632.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1274.0
e value: 5.5461e-175
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45825948|gb|AAS77639.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825950|gb|AAS77640.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1269.0
e value: 3.05564e-174
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45825952|gb|AAS77641.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825954|gb|AAS77642.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825956|gb|AAS77643.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825958|gb|AAS77644.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1266.0
e value: 7.65872e-174
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45825944|gb|AAS77637.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825946|gb|AAS77638.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1262.0
e value: 3.31687e-173
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45825960|gb|AAS77645.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1258.0
e value: 1.2876e-172
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLC+LNN+FYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCILNNNFYY...

****Alignment****
sequence ID: gi|45825932|gb|AAS77631.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1257.0
e value: 1.86764e-172
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45825912|gb|AAS77621.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825914|gb|AAS77622.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825916|gb|AAS77623.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825918|gb|AAS77624.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825920|gb|AAS77625.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825922|gb|AAS77626.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825924|gb|AAS77627.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825926|gb|AAS77628.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825928|gb|AAS77629.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825930|gb|AAS77630.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1253.0
e value: 9.52979e-172
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKG+INLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHS+ELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGVINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSSELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45825942|gb|AAS77636.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1250.0
e value: 2.60674e-171
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|62766416|gb|AAX99337.1| glycoprotein precursor [Machupo mammarenavirus]
length: 496
score: 1272.0
e value: 4.20087e-171
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45825938|gb|AAS77634.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1248.0
e value: 4.31124e-171
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45825940|gb|AAS77635.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1242.0
e value: 3.92735e-170
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|34365533|ref|NP_899212.1| glycoprotein precursor [Machupo mammarenavirus] >gi|22901291|gb|AAN09942.1| glycoprotein precursor [Machupo mammarenavirus] >gi|23307851|gb|AAN05425.1| glycoprotein precursor [Machupo mammarenavirus] >gi|45826503|gb|AAS77877.1| glycoprotein precursor [Machupo mammarenavirus] >gi|48095766|gb|AAT40451.1| glycoprotein precursor [Machupo mammarenavirus] >gi|62766413|gb|AAX99335.1| glycoprotein precursor [Machupo mammarenavirus] >gi|365976987|gb|AEX08372.1| glycoprotein precursor [Machupo mammarenavirus] >gi|666915575|gb|AIG51558.1| glycoprotein precursor [Machupo mammarenavirus]
length: 496
score: 1265.0
e value: 6.06597e-170
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|62766419|gb|AAX99339.1| glycoprotein precursor [Machupo mammarenavirus]
length: 496
score: 1253.0
e value: 3.68973e-168
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|62766404|gb|AAX99329.1| glycoprotein precursor [Machupo mammarenavirus]
length: 496
score: 1249.0
e value: 1.37897e-167
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKG+INLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHS+ELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGVINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSSELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|45825962|gb|AAS77646.1| glycoprotein 1, partial [Machupo mammarenavirus]
length: 257
score: 1224.0
e value: 1.91596e-167
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|82002961|sp|Q6IUF7.1|GLYC_MACHU RecName: Full=Pre-glycoprotein polyprotein GP complex; Contains: RecName: Full=Stable signal peptide; Short=SSP; Contains: RecName: Full=Glycoprotein G1; Short=GP1; Contains: RecName: Full=Glycoprotein G2; Short=GP2 >gi|48525711|gb|AAT45081.1| glycoprotein precursor [Machupo mammarenavirus] >gi|62766401|gb|AAX99327.1| glycoprotein precursor [Machupo mammarenavirus]
length: 496
score: 1248.0
e value: 2.10907e-167
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|62766410|gb|AAX99333.1| glycoprotein precursor [Machupo mammarenavirus]
length: 496
score: 1238.0
e value: 6.66728e-166
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|48095772|gb|AAT40455.1| glycoprotein precursor [Machupo mammarenavirus]
length: 496
score: 1213.0
e value: 4.05713e-162
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNS YY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSLYY...

****Alignment****
sequence ID: gi|62766407|gb|AAX99331.1| glycoprotein precursor [Machupo mammarenavirus]
length: 496
score: 1212.0
e value: 5.15547e-162
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY...

****Alignment****
sequence ID: gi|365976993|gb|AEX08376.1| glycoprotein precursor [Machupo mammarenavirus]
length: 496
score: 1197.0
e value: 9.28986e-160
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL L GRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNS YY...
  Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLXGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSXYY...

****Alignment****
sequence ID: gi|255648557|gb|ACU24736.1| glycoprotein precursor, partial [Machupo mammarenavirus]
length: 473
score: 1093.0
e value: 2.75951e-144
Query: VAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYD...
Match: VAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQ LLANHSNELPSLCMLNNSFYYMKGG N FLIRVSD+SVLMKE+D...
  Hit: VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQGLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMKEHD...

****Alignment****
sequence ID: gi|255648545|gb|ACU24728.1| glycoprotein precursor, partial [Machupo mammarenavirus] >gi|255648548|gb|ACU24730.1| glycoprotein precursor, partial [Machupo mammarenavirus] >gi|255648551|gb|ACU24732.1| glycoprotein precursor, partial [Machupo mammarenavirus]
length: 473
score: 1086.0
e value: 3.4405e-143
Query: VAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYD...
Match: VAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYYMKGG N FLIRVS +SVL +E+D...
  Hit: VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSSISVLSREHD...

****Alignment****
sequence ID: gi|255648554|gb|ACU24734.1| glycoprotein precursor, partial [Machupo mammarenavirus]
length: 473
score: 1048.0
e value: 1.77582e-137
Query: VAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYD...
Match: VAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQ LLANHSNELPSLCMLNNSFYYMKGG N FLIRVS VSV+ +E+D...
  Hit: VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQGLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSSVSVVSREHD...

****Alignment****
sequence ID: gi|240104274|pdb|2WFO|A Chain A, Crystal Structure Of Machupo Virus Envelope Glycoprotein Gp1
length: 182
score: 938.0
e value: 4.9368e-125
Query: ELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNISKADESR...
Match: ELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNISKADESR...
  Hit: ELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNISKADESR...

****Alignment****
sequence ID: gi|290790109|pdb|3KAS|B Chain B, Machupo Virus Gp1 Bound To Human Transferrin Receptor 1
length: 162
score: 841.0
e value: 1.04041e-110
Query: NHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNISKA...
Match: NHSNELPSLCMLNNSFYYM+GG N FLIRVSD+SVLMKEYDVS+YEPEDLGNCLNKSDSSWAIHWFS ALGHDWLMDPPMLCRNKTKKEGSNIQFNISKA...
  Hit: NHSNELPSLCMLNNSFYYMRGGVNTFLIRVSDISVLMKEYDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNISKA...

****Alignment****
sequence ID: gi|40807309|ref|NP_955756.1| glycoprotein G1 [Junin mammarenavirus]
length: 247
score: 710.0
e value: 1.29708e-89
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQ ISF QEIP FLQEALNIALVAVSLIA+IKGI+NLYKSGLFQF  FL LAGRSC++  FKIGLHTEFQ+V+F+M  L +N+ ++LP LC LN S  Y...
  Hit: MGQFISFMQEIPTFLQEALNIALVAVSLIAIIKGIVNLYKSGLFQFFVFLALAGRSCTEEAFKIGLHTEFQTVSFSMVGLFSNNPHDLPLLCTLNKSHLY...

****Alignment****
sequence ID: gi|115510974|gb|ABI99475.1| glycoprotein precursor [Junin mammarenavirus]
length: 485
score: 718.0
e value: 1.11343e-87
Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY...
Match: MGQ ISF QEIP FLQEALNIALVAVSLIA+IKGI+NLYKSGLFQF  FL LAGRSC++  FKIGLHTEFQ+V+F+M  LL+N  ++LP LC LN S  Y...
  Hit: MGQFISFMQEIPTFLQEALNIALVAVSLIAIIKGIVNLYKSGLFQFFVFLALAGRSCTEEAFKIGLHTEFQTVSFSMVGLLSNSPHDLPLLCTLNKSHLY...

Problems

Problem 1:

Count the number of hits with an E value of less than or equal to 1e-100.

In [2]:
E_cutoff = 1e-100
count = 0
for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
        if hsp.expect <= E_cutoff:
            count += 1
            
print("There are", count, "hits with E <=", E_cutoff)
There are 28 hits with E <= 1e-100

Problem 2:

Extract the genbank identifiers (written as gb|string|, where string is the actual identifier, consisting of letters, numbers, and the period symbol) for all matches with an E value of less than or equal to 1e-100, and store them in a python list. For matches that list multiple genbank identifiers, only extract the first one.

In [3]:
import re

E_cutoff = 1e-100
gb_list = []
for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
        if hsp.expect <= E_cutoff:
            match = re.search(r'gb\|([\w\d\.]+)\|', alignment.title)
            if match:
                gb_id = match.group(1)
                gb_list.append(gb_id)
            else:
                print("could not find genbank identifier in ", alignment.title)
                
print(gb_list)
could not find genbank identifier in  gi|240104274|pdb|2WFO|A Chain A, Crystal Structure Of Machupo Virus Envelope Glycoprotein Gp1
could not find genbank identifier in  gi|290790109|pdb|3KAS|B Chain B, Machupo Virus Gp1 Bound To Human Transferrin Receptor 1
['AAS77647.1', 'AAS77879.1', 'AAS77633.1', 'AAS77632.1', 'AAS77639.1', 'AAS77641.1', 'AAS77637.1', 'AAS77645.1', 'AAS77631.1', 'AAS77621.1', 'AAS77636.1', 'AAX99337.1', 'AAS77634.1', 'AAS77635.1', 'AAN09942.1', 'AAX99339.1', 'AAX99329.1', 'AAS77646.1', 'AAT45081.1', 'AAX99333.1', 'AAT40455.1', 'AAX99331.1', 'AEX08376.1', 'ACU24736.1', 'ACU24728.1', 'ACU24734.1']

If this was easy

Problem 3:

Using the list of genbank identifiers obtained in the previous exercise, download the corresponding sequences from genbank and print them out in FASTA format. Hint: You will have to specify the database as "protein" for this to work, since the previous exercise generated identifiers for protein sequences.

Hint: Use the function SeqIO.write() to output your results in FASTA format, and use sys.stdout from the sys module as your output handle.

In [4]:
from Bio import Entrez, SeqIO
import sys

Entrez.email = "wilke@austin.utexas.edu" # put your email here

handle = Entrez.efetch(db="protein", id=gb_list, rettype="gb", retmode="text")
records = SeqIO.parse(handle, "genbank")

for record in records:
    SeqIO.write(record, sys.stdout, "fasta")
    
handle.close() # important, close the handle only after you have iterated over the records. Otherwise you will get an error!
>AAS77647.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDG
TFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMK
EYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADESRVYGKKIRNGMRHLFRGFYDPCEEGKVCYVTINQCGDPSSFEYCGTNYLSKCQFD
HVNTLHFLVRSKTHLNF
>AAS77879.1 glycoprotein precursor [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDG
TFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMK
EYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADESRVYGKKIRNGMRHLFRGFYDPCEEGKVCYVTINQCGDPSSFEYCGTNYLSKCQFD
HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA
VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRE
LMSVPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNISEFRNDWILESDHLISEMLSK
EYAERQSKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSLGGCRC
GKYPRLKKPTVWHRRH
>AAS77633.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLVSFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDVSVLMK
EYDVSVYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNRTKKEGSNIQFNIS
KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGTNYLSKCQFD
HVNTLHFLVRSKTHLNF
>AAS77632.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK
EYDVSVYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNRTKKEGSNIQFNIS
KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGTNYLSKCQFD
HVNTLHFLVRSKTHLNF
>AAS77639.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK
EYDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADDARVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGVNHLSKCQFD
HVNTLHFLVRSKTHLNF
>AAS77641.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMRGGVNTFLIRVSDISVLMK
EYDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADDARVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGVNHLSKCQFD
HVNTLHFLVRSKTHLNF
>AAS77637.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK
EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKREGSNIQFNIS
KADDARVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGVNHLSKCQFD
HVNTLHFLVRSKTHLNF
>AAS77645.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCILNNNFYYMKGGVNTFLIRVSDISVLMK
EYDVSIYEPEDLGNCLNKSDSSWAVHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADDTRVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGVNHLSKCQFD
HVNTLHFLVRSKTHLNF
>AAS77631.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMRGGVNTFLIRVSDVSVLMK
EYDVSIYEPEDLGNCLNKSDSSWAVHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADDTKVYGKKIRNGMRHLFRGFHDLCEEGKVCYLTINQCGDPSSFDYCNTNYLSKCQFD
HVNTLHFLVRSKTHLNF
>AAS77621.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGVINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSSELPSLCMLNNSFYYMKGGVNTFLIRVSDVSVLMK
EYDVSIYEPEDLGNCLNKSDSSWAVHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADDTKVYGKKIRNGMRHLFRGFHDLCEEGKVCYLTINQCGDPSSFDYCNTNYLSKCQFD
HVNTLHFLVRSKTHLNF
>AAS77636.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK
EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKKCYLTINQCGDPSSFDYCGMDHLSKCQFD
HVNTLHFLVRSKTHLNF
>AAX99337.1 glycoprotein precursor [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK
EYDVSVYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNRTKKEGSNIQFNIS
KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGTNYLSKCQFD
HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA
VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINLLSQTINALISDNLLMKNKIKE
LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK
EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGLPTHRHLKGEACPLPHRLDSFGGCRC
GKYPRLKKPTVWHRRH
>AAS77634.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK
EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKREGSNIQFNIS
KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKKCYLTINQCGDPSSFDYCGTDHLSKCQFD
HVNTLHFLVRRKTHLNF
>AAS77635.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLTK
EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKREGSNIQFNIS
KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKKCYLTINQCGDPSSLDYCGTDHLSKCQFD
HVNTLHFLVRSKTHLNF
>AAN09942.1 glycoprotein precursor [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMRGGVNTFLIRVSDISVLMK
EYDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADDARVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGVNHLSKCQFD
HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA
VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINLLSQTVNALISDNLLMKNKIKE
LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK
EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHKLDSFGGCRC
GKYPRLKKPTIWHKRH
>AAX99339.1 glycoprotein precursor [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMRGGVNTFLIRVSDVSVLMK
EYDVSIYEPEDLGNCLNKSDSSWAVHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADDTKVYGKKIRNGMRHLFRGFHDLCEEGKVCYLTINQCGDPSSFDYCNTNYLSKCQFD
HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA
VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRE
LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK
EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSFGGCRC
GKYPRLKKPTVWHRRH
>AAX99329.1 glycoprotein precursor [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGVINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSSELPSLCMLNNSFYYMKGGVNTFLIRVSDVSVLMK
EYDVSIYEPEDLGNCLNKSDSSWAVHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADDTKVYGKKIRNGMRHLFRGFHDLCEEGKVCYLTINQCGDPSSFDYCNTNYLSKCQFD
HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA
VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRE
LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK
EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSFGGCRC
GKYPRLKKPTVWHRRH
>AAS77646.1 glycoprotein 1, partial [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSSVSVLMK
EHDVSVYEPEDLGNCLNKSDSSWAVHWLSNALGHDWLMDSPMLCRNKTKMEGSNIQLNIS
KADDARVYGKKIRNGMRHLFRGFHDSCEEGKLCYLTINQCGDPSSFDYCSTNHLSKCQFD
HVNTLHFLVRSKSHLNF
>AAT45081.1 glycoprotein precursor [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK
EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS
KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKKCYLTINQCGDPSSFDYCGMDHLSKCQFD
HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA
VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINLLSQTVNALISDNLLMKNKIKE
LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK
EYAERQGKTPITLVDICFWSTVFFTASLFLHLVGIPTHRHLKGEACPLPHKLDSFGGCRC
GKYPRLRKPTIWHKRH
>AAX99333.1 glycoprotein precursor [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLTK
EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKREGSNIQFNIS
KADDVRVYGKKIRNGMRHLFRGFHDPCEEGRKCYLTINQCGDPSSLDYCGTDHLSKCQFD
HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDXSGKDMPGGYCLEEWMLIAAKMKCFGNTA
VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINLLSQTVNALISDNLLMKNKIKE
LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK
EYAERQGKTPITLVDICFWSTVFFTASLFLHLVGIPTHRHLKGEACPLPHKLDSFGGCRC
GKYPRLRKPTIWHRRH
>AAT40455.1 glycoprotein precursor [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSLYYMKGGVNTFLIRVSSVSVLMK
EHDVSVYEPEDLGNCLNKSDSSWAVHWLSNALGHDWLMDSPMLCRNKTKMEGSNIQLNIS
KADDARVYGKKIRNGMRHLFRGFHDSCEEGKLCYLTINQCGDPSSFDYCSTNHLSKCQFD
HVNTLHFLVRSKSHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA
VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIKE
LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK
EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSLGGCRC
GKYPRLKKPTVWHRRH
>AAX99331.1 glycoprotein precursor [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSSVSVLSR
EHDVSVYEPEDLENCFNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKVEGSNIQFNIS
KADDTKVYGKKIRNGMRHLFRGFYDLCEEGKVCYLTINQCGDPSSFDYCNTSYLSKCQFD
HVNTLQFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA
VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRE
LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK
EYVERQGKTPITLVDICFWSTVFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSFGGCRC
GKYPRLKKPTVWHRRH
>AEX08376.1 glycoprotein precursor [Machupo mammarenavirus]
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLXGRSCSDG
TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSXYYMKGGVNTFLIRVSSVSVLMK
EXDVSVYEPEDLGNCLNKSDSSWAVHWLSNALGHDWLMDSPMLCRNKTXMEGSNIQLNIS
KADDARVYGKKIRNGMRHLFRGFHDSCEEGKLCYLTINQCGDPSSFDYCSTNHLSKCQFD
HVNTLHFLVRSKSHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIXAKMKCFGNTA
VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIKE
LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK
EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSLGGCRC
GKYPRLKKPTVWHRRH
>ACU24736.1 glycoprotein precursor, partial [Machupo mammarenavirus]
VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQGLLAN
HSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMKEHDVSIYEPEDLGNCLNKSDSSW
AIHWFSNALGHDWLMDPPMLCRNKTKPEGSNIELNISKADDVRVYGKKIRNGMRHLFRGF
HDSCEEGKKCYLTINQCGDPSSIDYCNTGHLSKCQFDHVNTLHFLVRSKTHLNFERSLKA
FFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTAVAKCNQNHDSEFCDMLRLFDYNK
NAIKTLNDESKKEINLLSQTVNALISDNLLMKNKIKELMSIPYCNYTKFWYVNHTLTGQH
TLPRCWLIKNGSYLNTSEFRNDWILESDHLISEMLSKEYAERQGKTPITLVDICFWSTVF
FTASLFLHLVGIPTHRHLKGEACPLPHKLDSFGGCRCGKYPRLRKPTIWHKRH
>ACU24728.1 glycoprotein precursor, partial [Machupo mammarenavirus]
VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLAN
HSNELPSLCMLNNSFYYMKGGVNTFLIRVSSISVLSREHDVLVHEPDDLENCLNKSDSSW
AIHWFSNALGHDWLMDPPMLCRNKTKVEGSNIQFNISKADDTKVYGKKIRNGMRHLFRGF
HDLCEEGKVCYLTINQCGDPSSFDYCSTSYLSKCQFDHVNTLHFLVRSKTHLNFERSLKA
FFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTAVAKCNQNHDSEFCDMLRLFDYNK
NAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRELMSIPYCNYTKFWYVNHTLTGQH
TLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSKEYAERQGKTPITLVDICFWSTVF
FTASLFLHLVGIPTHRHLKGEACPLPHRLDSFGGCRCGKYPRLKKPTVWHRRH
>ACU24734.1 glycoprotein precursor, partial [Machupo mammarenavirus]
VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQGLLAN
HSNELPSLCMLNNSFYYMKGGVNTFLIRVSSVSVVSREHDVLVHEPEDLGNCLNESDSSW
ALHWFSNALGHDWLVDPPMLCRNKTKVEGSNIQFNISKADDTKVYGKKIRNGMRHLFRGF
HDLCEEGKVCYLTINQCGDPSSFDYCDTNHLSKCQFDHVNTLHFLVRSKTHLNFERSLKA
FFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTAVAKCNQNHDSEFCDMLRLFDYNK
NAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRELMSIPYCNYTKFWYVNHTLTGQH
TLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSKEYVERQGKTPITLVDICFWSTVF
FTASLFLHLVGIPTHRHLKGEACPLPHRLDSFGGCRCGKYPRLKKPTVWHRRH