Skip to content

Commit

Permalink
vdb/flu_upload: only prefix accession with "EPI" if needed
Browse files Browse the repository at this point in the history
Digging through git history and found example of FASTA header that
suggests GISAID did not used to include the "EPI" prefix in their
DNA Accession no. field.¹

This must have changed around October 2023 because we have sequences
with "EPIEPI" accessions for sequences submitted after September 27th 2023.²

This commit changes the ingest to only prefix with "EPI" if the accession
does not already have the prefix.

¹ https://github.com/nextstrain/fauna/blame/f485baa3621002b3ff6f833c743180239a92bf14/vdb/gisaid_flu_upload.py#L281-L282
² https://bedfordlab.slack.com/archives/C03KWDET9/p1700609695217959
  • Loading branch information
joverlee521 committed Nov 22, 2023
1 parent 14c500b commit ab32ebd
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion vdb/flu_upload.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ def fix_casing(self, doc):
for field in ['gender', 'host', 'locus']:
if field in doc and doc[field] is not None:
doc[field] = self.camelcase_to_snakecase(doc[field])
if 'accession' in doc and doc['accession'] is not None:
if doc.get('accession') is not None and not doc['accession'].startswith('EPI'):
doc['accession'] = 'EPI' + doc['accession']

def fix_age(self, doc):
Expand Down

0 comments on commit ab32ebd

Please sign in to comment.