April 30, 2020
The web interface to BLAST is available here: http://blast.ncbi.nlm.nih.gov/Blast.cgi
Let's search for proteins related to the following query sequence, which is the glycoprotein of Machupo virus (causative agent of Bolivian hemorrhagic fever):
>GI:45825963|Machupo virus glycoprotein
MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCS
DGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVS
VLMKEYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSN
IQFNISKADESRVYGKKIRNGMRHLFRGFYDPCEEGKVCYVTINQCGDPSSFEYCGTN
YLSKCQFDHVNTLHFLVRSKTHLNF
We can download the blast results from the NCBI website in XML format and store them as Machupo_BLAST.xml
. This file is available here.
Now we can process this file with Biopython.
from Bio.Blast import NCBIXML
from urllib.request import urlretrieve # to download xml file
# download file from course website and store locally
urlretrieve('http://wilkelab.org/classes/SDS348/data_sets/Machupo_BLAST.xml', 'Machupo_BLAST.xml')
# open the downloaded file and parse with NCBIXML.read()
blast_handle = open("Machupo_BLAST.xml")
blast_record = NCBIXML.read(blast_handle)
blast_handle.close()
imax = 30 # process the first 30 alignments
i = 0
for alignment in blast_record.alignments:
i += 1
if i > imax:
break
# we need a for loop here because in theory we could have
# more than one hsp (High-scoring Segment Pair) per alignment
for hsp in alignment.hsps:
print('\n****Alignment****')
print('sequence ID:', alignment.title)
print('length:', alignment.length)
print('score:', hsp.score)
print('e value:', hsp.expect)
print("Query:", hsp.query[0:100] + '...')
print("Match:", hsp.match[0:100] + '...')
print(" Hit:", hsp.sbjct[0:100] + '...')
****Alignment**** sequence ID: gi|45825964|gb|AAS77647.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1381.0 e value: 0.0 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45826506|gb|AAS77879.1| glycoprotein precursor [Machupo mammarenavirus] length: 496 score: 1379.0 e value: 0.0 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45825936|gb|AAS77633.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1274.0 e value: 4.8109e-175 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQL+SFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLVSFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45825934|gb|AAS77632.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1274.0 e value: 5.5461e-175 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45825948|gb|AAS77639.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825950|gb|AAS77640.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1269.0 e value: 3.05564e-174 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45825952|gb|AAS77641.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825954|gb|AAS77642.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825956|gb|AAS77643.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825958|gb|AAS77644.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1266.0 e value: 7.65872e-174 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45825944|gb|AAS77637.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825946|gb|AAS77638.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1262.0 e value: 3.31687e-173 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45825960|gb|AAS77645.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1258.0 e value: 1.2876e-172 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLC+LNN+FYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCILNNNFYY... ****Alignment**** sequence ID: gi|45825932|gb|AAS77631.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1257.0 e value: 1.86764e-172 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45825912|gb|AAS77621.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825914|gb|AAS77622.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825916|gb|AAS77623.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825918|gb|AAS77624.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825920|gb|AAS77625.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825922|gb|AAS77626.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825924|gb|AAS77627.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825926|gb|AAS77628.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825928|gb|AAS77629.1| glycoprotein 1, partial [Machupo mammarenavirus] >gi|45825930|gb|AAS77630.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1253.0 e value: 9.52979e-172 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKG+INLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHS+ELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGVINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSSELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45825942|gb|AAS77636.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1250.0 e value: 2.60674e-171 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|62766416|gb|AAX99337.1| glycoprotein precursor [Machupo mammarenavirus] length: 496 score: 1272.0 e value: 4.20087e-171 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45825938|gb|AAS77634.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1248.0 e value: 4.31124e-171 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45825940|gb|AAS77635.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1242.0 e value: 3.92735e-170 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|34365533|ref|NP_899212.1| glycoprotein precursor [Machupo mammarenavirus] >gi|22901291|gb|AAN09942.1| glycoprotein precursor [Machupo mammarenavirus] >gi|23307851|gb|AAN05425.1| glycoprotein precursor [Machupo mammarenavirus] >gi|45826503|gb|AAS77877.1| glycoprotein precursor [Machupo mammarenavirus] >gi|48095766|gb|AAT40451.1| glycoprotein precursor [Machupo mammarenavirus] >gi|62766413|gb|AAX99335.1| glycoprotein precursor [Machupo mammarenavirus] >gi|365976987|gb|AEX08372.1| glycoprotein precursor [Machupo mammarenavirus] >gi|666915575|gb|AIG51558.1| glycoprotein precursor [Machupo mammarenavirus] length: 496 score: 1265.0 e value: 6.06597e-170 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|62766419|gb|AAX99339.1| glycoprotein precursor [Machupo mammarenavirus] length: 496 score: 1253.0 e value: 3.68973e-168 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|62766404|gb|AAX99329.1| glycoprotein precursor [Machupo mammarenavirus] length: 496 score: 1249.0 e value: 1.37897e-167 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKG+INLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHS+ELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGVINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSSELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|45825962|gb|AAS77646.1| glycoprotein 1, partial [Machupo mammarenavirus] length: 257 score: 1224.0 e value: 1.91596e-167 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|82002961|sp|Q6IUF7.1|GLYC_MACHU RecName: Full=Pre-glycoprotein polyprotein GP complex; Contains: RecName: Full=Stable signal peptide; Short=SSP; Contains: RecName: Full=Glycoprotein G1; Short=GP1; Contains: RecName: Full=Glycoprotein G2; Short=GP2 >gi|48525711|gb|AAT45081.1| glycoprotein precursor [Machupo mammarenavirus] >gi|62766401|gb|AAX99327.1| glycoprotein precursor [Machupo mammarenavirus] length: 496 score: 1248.0 e value: 2.10907e-167 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|62766410|gb|AAX99333.1| glycoprotein precursor [Machupo mammarenavirus] length: 496 score: 1238.0 e value: 6.66728e-166 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|48095772|gb|AAT40455.1| glycoprotein precursor [Machupo mammarenavirus] length: 496 score: 1213.0 e value: 4.05713e-162 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNS YY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSLYY... ****Alignment**** sequence ID: gi|62766407|gb|AAX99331.1| glycoprotein precursor [Machupo mammarenavirus] length: 496 score: 1212.0 e value: 5.15547e-162 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYY... ****Alignment**** sequence ID: gi|365976993|gb|AEX08376.1| glycoprotein precursor [Machupo mammarenavirus] length: 496 score: 1197.0 e value: 9.28986e-160 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFL L GRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNS YY... Hit: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLXGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSXYY... ****Alignment**** sequence ID: gi|255648557|gb|ACU24736.1| glycoprotein precursor, partial [Machupo mammarenavirus] length: 473 score: 1093.0 e value: 2.75951e-144 Query: VAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYD... Match: VAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQ LLANHSNELPSLCMLNNSFYYMKGG N FLIRVSD+SVLMKE+D... Hit: VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQGLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMKEHD... ****Alignment**** sequence ID: gi|255648545|gb|ACU24728.1| glycoprotein precursor, partial [Machupo mammarenavirus] >gi|255648548|gb|ACU24730.1| glycoprotein precursor, partial [Machupo mammarenavirus] >gi|255648551|gb|ACU24732.1| glycoprotein precursor, partial [Machupo mammarenavirus] length: 473 score: 1086.0 e value: 3.4405e-143 Query: VAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYD... Match: VAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQRLLANHSNELPSLCMLNNSFYYMKGG N FLIRVS +SVL +E+D... Hit: VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSSISVLSREHD... ****Alignment**** sequence ID: gi|255648554|gb|ACU24734.1| glycoprotein precursor, partial [Machupo mammarenavirus] length: 473 score: 1048.0 e value: 1.77582e-137 Query: VAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYD... Match: VAVSLIAVIKGIINLYKSGLFQFIFFL LAGRSCSDGTFKIGLHTEFQSVT TMQ LLANHSNELPSLCMLNNSFYYMKGG N FLIRVS VSV+ +E+D... Hit: VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQGLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSSVSVVSREHD... ****Alignment**** sequence ID: gi|240104274|pdb|2WFO|A Chain A, Crystal Structure Of Machupo Virus Envelope Glycoprotein Gp1 length: 182 score: 938.0 e value: 4.9368e-125 Query: ELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNISKADESR... Match: ELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNISKADESR... Hit: ELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNISKADESR... ****Alignment**** sequence ID: gi|290790109|pdb|3KAS|B Chain B, Machupo Virus Gp1 Bound To Human Transferrin Receptor 1 length: 162 score: 841.0 e value: 1.04041e-110 Query: NHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMKEYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNISKA... Match: NHSNELPSLCMLNNSFYYM+GG N FLIRVSD+SVLMKEYDVS+YEPEDLGNCLNKSDSSWAIHWFS ALGHDWLMDPPMLCRNKTKKEGSNIQFNISKA... Hit: NHSNELPSLCMLNNSFYYMRGGVNTFLIRVSDISVLMKEYDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNISKA... ****Alignment**** sequence ID: gi|40807309|ref|NP_955756.1| glycoprotein G1 [Junin mammarenavirus] length: 247 score: 710.0 e value: 1.29708e-89 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQ ISF QEIP FLQEALNIALVAVSLIA+IKGI+NLYKSGLFQF FL LAGRSC++ FKIGLHTEFQ+V+F+M L +N+ ++LP LC LN S Y... Hit: MGQFISFMQEIPTFLQEALNIALVAVSLIAIIKGIVNLYKSGLFQFFVFLALAGRSCTEEAFKIGLHTEFQTVSFSMVGLFSNNPHDLPLLCTLNKSHLY... ****Alignment**** sequence ID: gi|115510974|gb|ABI99475.1| glycoprotein precursor [Junin mammarenavirus] length: 485 score: 718.0 e value: 1.11343e-87 Query: MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDGTFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYY... Match: MGQ ISF QEIP FLQEALNIALVAVSLIA+IKGI+NLYKSGLFQF FL LAGRSC++ FKIGLHTEFQ+V+F+M LL+N ++LP LC LN S Y... Hit: MGQFISFMQEIPTFLQEALNIALVAVSLIAIIKGIVNLYKSGLFQFFVFLALAGRSCTEEAFKIGLHTEFQTVSFSMVGLLSNSPHDLPLLCTLNKSHLY...
Problem 1:
Count the number of hits with an E value of less than or equal to 1e-100.
E_cutoff = 1e-100
count = 0
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
if hsp.expect <= E_cutoff:
count += 1
print("There are", count, "hits with E <=", E_cutoff)
There are 28 hits with E <= 1e-100
Problem 2:
Extract the genbank identifiers (written as gb|
string
|
, where string
is the actual identifier, consisting of letters, numbers, and the period symbol) for all matches with an E value of less than or equal to 1e-100, and store them in a python list. For matches that list multiple genbank identifiers, only extract the first one.
import re
E_cutoff = 1e-100
gb_list = []
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
if hsp.expect <= E_cutoff:
match = re.search(r'gb\|([\w\d\.]+)\|', alignment.title)
if match:
gb_id = match.group(1)
gb_list.append(gb_id)
else:
print("could not find genbank identifier in ", alignment.title)
print(gb_list)
could not find genbank identifier in gi|240104274|pdb|2WFO|A Chain A, Crystal Structure Of Machupo Virus Envelope Glycoprotein Gp1 could not find genbank identifier in gi|290790109|pdb|3KAS|B Chain B, Machupo Virus Gp1 Bound To Human Transferrin Receptor 1 ['AAS77647.1', 'AAS77879.1', 'AAS77633.1', 'AAS77632.1', 'AAS77639.1', 'AAS77641.1', 'AAS77637.1', 'AAS77645.1', 'AAS77631.1', 'AAS77621.1', 'AAS77636.1', 'AAX99337.1', 'AAS77634.1', 'AAS77635.1', 'AAN09942.1', 'AAX99339.1', 'AAX99329.1', 'AAS77646.1', 'AAT45081.1', 'AAX99333.1', 'AAT40455.1', 'AAX99331.1', 'AEX08376.1', 'ACU24736.1', 'ACU24728.1', 'ACU24734.1']
Problem 3:
Using the list of genbank identifiers obtained in the previous exercise, download the corresponding sequences from genbank and print them out in FASTA format. Hint: You will have to specify the database as "protein" for this to work, since the previous exercise generated identifiers for protein sequences.
Hint: Use the function SeqIO.write()
to output your results in FASTA format, and use sys.stdout
from the sys
module as your output handle.
from Bio import Entrez, SeqIO
import sys
Entrez.email = "wilke@austin.utexas.edu" # put your email here
handle = Entrez.efetch(db="protein", id=gb_list, rettype="gb", retmode="text")
records = SeqIO.parse(handle, "genbank")
for record in records:
SeqIO.write(record, sys.stdout, "fasta")
handle.close() # important, close the handle only after you have iterated over the records. Otherwise you will get an error!
>AAS77647.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDG TFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMK EYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADESRVYGKKIRNGMRHLFRGFYDPCEEGKVCYVTINQCGDPSSFEYCGTNYLSKCQFD HVNTLHFLVRSKTHLNF >AAS77879.1 glycoprotein precursor [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLFLAGRSCSDG TFKIGLHTEFQSVTFTMQRLLANHSNELPSLCMLNNSFYYMKGGANIFLIRVSDVSVLMK EYDVSVYEPEDLGNCLNKSDSSWAIHWFSIALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADESRVYGKKIRNGMRHLFRGFYDPCEEGKVCYVTINQCGDPSSFEYCGTNYLSKCQFD HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRE LMSVPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNISEFRNDWILESDHLISEMLSK EYAERQSKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSLGGCRC GKYPRLKKPTVWHRRH >AAS77633.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLVSFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDVSVLMK EYDVSVYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNRTKKEGSNIQFNIS KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGTNYLSKCQFD HVNTLHFLVRSKTHLNF >AAS77632.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK EYDVSVYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNRTKKEGSNIQFNIS KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGTNYLSKCQFD HVNTLHFLVRSKTHLNF >AAS77639.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK EYDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADDARVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGVNHLSKCQFD HVNTLHFLVRSKTHLNF >AAS77641.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMRGGVNTFLIRVSDISVLMK EYDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADDARVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGVNHLSKCQFD HVNTLHFLVRSKTHLNF >AAS77637.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKREGSNIQFNIS KADDARVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGVNHLSKCQFD HVNTLHFLVRSKTHLNF >AAS77645.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCILNNNFYYMKGGVNTFLIRVSDISVLMK EYDVSIYEPEDLGNCLNKSDSSWAVHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADDTRVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGVNHLSKCQFD HVNTLHFLVRSKTHLNF >AAS77631.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMRGGVNTFLIRVSDVSVLMK EYDVSIYEPEDLGNCLNKSDSSWAVHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADDTKVYGKKIRNGMRHLFRGFHDLCEEGKVCYLTINQCGDPSSFDYCNTNYLSKCQFD HVNTLHFLVRSKTHLNF >AAS77621.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGVINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSSELPSLCMLNNSFYYMKGGVNTFLIRVSDVSVLMK EYDVSIYEPEDLGNCLNKSDSSWAVHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADDTKVYGKKIRNGMRHLFRGFHDLCEEGKVCYLTINQCGDPSSFDYCNTNYLSKCQFD HVNTLHFLVRSKTHLNF >AAS77636.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKKCYLTINQCGDPSSFDYCGMDHLSKCQFD HVNTLHFLVRSKTHLNF >AAX99337.1 glycoprotein precursor [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK EYDVSVYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNRTKKEGSNIQFNIS KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGTNYLSKCQFD HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINLLSQTINALISDNLLMKNKIKE LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGLPTHRHLKGEACPLPHRLDSFGGCRC GKYPRLKKPTVWHRRH >AAS77634.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKREGSNIQFNIS KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKKCYLTINQCGDPSSFDYCGTDHLSKCQFD HVNTLHFLVRRKTHLNF >AAS77635.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLTK EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKREGSNIQFNIS KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKKCYLTINQCGDPSSLDYCGTDHLSKCQFD HVNTLHFLVRSKTHLNF >AAN09942.1 glycoprotein precursor [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMRGGVNTFLIRVSDISVLMK EYDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADDARVYGKKIRNGMRHLFRGFHDPCEEGKVCYLTINQCGDPSSFDYCGVNHLSKCQFD HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINLLSQTVNALISDNLLMKNKIKE LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHKLDSFGGCRC GKYPRLKKPTIWHKRH >AAX99339.1 glycoprotein precursor [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMRGGVNTFLIRVSDVSVLMK EYDVSIYEPEDLGNCLNKSDSSWAVHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADDTKVYGKKIRNGMRHLFRGFHDLCEEGKVCYLTINQCGDPSSFDYCNTNYLSKCQFD HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRE LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSFGGCRC GKYPRLKKPTVWHRRH >AAX99329.1 glycoprotein precursor [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGVINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSSELPSLCMLNNSFYYMKGGVNTFLIRVSDVSVLMK EYDVSIYEPEDLGNCLNKSDSSWAVHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADDTKVYGKKIRNGMRHLFRGFHDLCEEGKVCYLTINQCGDPSSFDYCNTNYLSKCQFD HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRE LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSFGGCRC GKYPRLKKPTVWHRRH >AAS77646.1 glycoprotein 1, partial [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSSVSVLMK EHDVSVYEPEDLGNCLNKSDSSWAVHWLSNALGHDWLMDSPMLCRNKTKMEGSNIQLNIS KADDARVYGKKIRNGMRHLFRGFHDSCEEGKLCYLTINQCGDPSSFDYCSTNHLSKCQFD HVNTLHFLVRSKSHLNF >AAT45081.1 glycoprotein precursor [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMK EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKKEGSNIQFNIS KADDVRVYGKKIRNGMRHLFRGFHDPCEEGKKCYLTINQCGDPSSFDYCGMDHLSKCQFD HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINLLSQTVNALISDNLLMKNKIKE LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK EYAERQGKTPITLVDICFWSTVFFTASLFLHLVGIPTHRHLKGEACPLPHKLDSFGGCRC GKYPRLRKPTIWHKRH >AAX99333.1 glycoprotein precursor [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLTK EHDVSIYEPEDLGNCLNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKREGSNIQFNIS KADDVRVYGKKIRNGMRHLFRGFHDPCEEGRKCYLTINQCGDPSSLDYCGTDHLSKCQFD HVNTLHFLVRSKTHLNFERSLKAFFSWSLTDXSGKDMPGGYCLEEWMLIAAKMKCFGNTA VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINLLSQTVNALISDNLLMKNKIKE LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK EYAERQGKTPITLVDICFWSTVFFTASLFLHLVGIPTHRHLKGEACPLPHKLDSFGGCRC GKYPRLRKPTIWHRRH >AAT40455.1 glycoprotein precursor [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSLYYMKGGVNTFLIRVSSVSVLMK EHDVSVYEPEDLGNCLNKSDSSWAVHWLSNALGHDWLMDSPMLCRNKTKMEGSNIQLNIS KADDARVYGKKIRNGMRHLFRGFHDSCEEGKLCYLTINQCGDPSSFDYCSTNHLSKCQFD HVNTLHFLVRSKSHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIKE LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSLGGCRC GKYPRLKKPTVWHRRH >AAX99331.1 glycoprotein precursor [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSFYYMKGGVNTFLIRVSSVSVLSR EHDVSVYEPEDLENCFNKSDSSWAIHWFSNALGHDWLMDPPMLCRNKTKVEGSNIQFNIS KADDTKVYGKKIRNGMRHLFRGFYDLCEEGKVCYLTINQCGDPSSFDYCNTSYLSKCQFD HVNTLQFLVRSKTHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTA VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRE LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK EYVERQGKTPITLVDICFWSTVFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSFGGCRC GKYPRLKKPTVWHRRH >AEX08376.1 glycoprotein precursor [Machupo mammarenavirus] MGQLISFFQEIPVFLQEALNIALVAVSLIAVIKGIINLYKSGLFQFIFFLLLXGRSCSDG TFKIGLHTEFQSVTLTMQRLLANHSNELPSLCMLNNSXYYMKGGVNTFLIRVSSVSVLMK EXDVSVYEPEDLGNCLNKSDSSWAVHWLSNALGHDWLMDSPMLCRNKTXMEGSNIQLNIS KADDARVYGKKIRNGMRHLFRGFHDSCEEGKLCYLTINQCGDPSSFDYCSTNHLSKCQFD HVNTLHFLVRSKSHLNFERSLKAFFSWSLTDSSGKDMPGGYCLEEWMLIXAKMKCFGNTA VAKCNQNHDSEFCDMLRLFDYNKNAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIKE LMSIPYCNYTKFWYVNHTLTGQHTLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSK EYAERQGKTPITLVDICFWSTIFFTASLFLHLVGIPTHRHLKGEACPLPHRLDSLGGCRC GKYPRLKKPTVWHRRH >ACU24736.1 glycoprotein precursor, partial [Machupo mammarenavirus] VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQGLLAN HSNELPSLCMLNNSFYYMKGGVNTFLIRVSDISVLMKEHDVSIYEPEDLGNCLNKSDSSW AIHWFSNALGHDWLMDPPMLCRNKTKPEGSNIELNISKADDVRVYGKKIRNGMRHLFRGF HDSCEEGKKCYLTINQCGDPSSIDYCNTGHLSKCQFDHVNTLHFLVRSKTHLNFERSLKA FFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTAVAKCNQNHDSEFCDMLRLFDYNK NAIKTLNDESKKEINLLSQTVNALISDNLLMKNKIKELMSIPYCNYTKFWYVNHTLTGQH TLPRCWLIKNGSYLNTSEFRNDWILESDHLISEMLSKEYAERQGKTPITLVDICFWSTVF FTASLFLHLVGIPTHRHLKGEACPLPHKLDSFGGCRCGKYPRLRKPTIWHKRH >ACU24728.1 glycoprotein precursor, partial [Machupo mammarenavirus] VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQRLLAN HSNELPSLCMLNNSFYYMKGGVNTFLIRVSSISVLSREHDVLVHEPDDLENCLNKSDSSW AIHWFSNALGHDWLMDPPMLCRNKTKVEGSNIQFNISKADDTKVYGKKIRNGMRHLFRGF HDLCEEGKVCYLTINQCGDPSSFDYCSTSYLSKCQFDHVNTLHFLVRSKTHLNFERSLKA FFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTAVAKCNQNHDSEFCDMLRLFDYNK NAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRELMSIPYCNYTKFWYVNHTLTGQH TLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSKEYAERQGKTPITLVDICFWSTVF FTASLFLHLVGIPTHRHLKGEACPLPHRLDSFGGCRCGKYPRLKKPTVWHRRH >ACU24734.1 glycoprotein precursor, partial [Machupo mammarenavirus] VAVSLIAVIKGIINLYKSGLFQFIFFLLLAGRSCSDGTFKIGLHTEFQSVTLTMQGLLAN HSNELPSLCMLNNSFYYMKGGVNTFLIRVSSVSVVSREHDVLVHEPEDLGNCLNESDSSW ALHWFSNALGHDWLVDPPMLCRNKTKVEGSNIQFNISKADDTKVYGKKIRNGMRHLFRGF HDLCEEGKVCYLTINQCGDPSSFDYCDTNHLSKCQFDHVNTLHFLVRSKTHLNFERSLKA FFSWSLTDSSGKDMPGGYCLEEWMLIAAKMKCFGNTAVAKCNQNHDSEFCDMLRLFDYNK NAIKTLNDESKKEINFLSQTVNALISDNLLMKNKIRELMSIPYCNYTKFWYVNHTLTGQH TLPRCWLIRNGSYLNTSEFRNDWILESDHLISEMLSKEYVERQGKTPITLVDICFWSTVF FTASLFLHLVGIPTHRHLKGEACPLPHRLDSFGGCRCGKYPRLKKPTVWHRRH