Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use nextstrain/ingest #412

Merged
merged 6 commits into from
Aug 24, 2023
Merged

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Jul 31, 2023

Description of proposed changes

Begin usage of centralized ingest scripts and add details on how to pull new updates from the central repo.

Related issue(s)

Testing

  • Checks pass (Update-image failure is unrelated and can be ignored. It was triggered by a non-functional change in update-image.yml).

  • Run GenBank fetch and ingest on PR branch, verify successful run.

    Evidence

    I checked for the presence of ✅ This pipeline has successfully finished 🎉 in the AWS Batch logs.

    I also spot checked some lines in the generated metadata file:

    # Download and decompress metadata files from latest rebuild and PR rebuild.
    aws s3api get-object --bucket nextstrain-data --key files/ncov/open/metadata.tsv.zst metadata-current.tsv.zst --version-id 3wF7ccfqiJ73cw7dVjq.mlrPf5A6Vah6
    unzstd metadata-current.tsv.zst
    aws s3 cp s3://nextstrain-staging/files/ncov/open/branch/victorlin/centralized-ingest-git-subrepo/metadata.tsv.zst metadata-branch.tsv.zst
    unzstd metadata-branch.tsv.zst
    
    # There is no difference in the first and last 10k lines.
    # Other differences can be assumed to be due data availability differences based on time of run.
    diff <(head -n 10000 metadata-current.tsv) <(head -n 10000 metadata-branch.tsv)
    diff <(tail -n 10000 metadata-current.tsv) <(tail -n 10000 metadata-branch.tsv)
  • Run GISAID fetch and ingest on PR branch, verify successful run.

    Evidence

    Since I already checked an output file in the GenBank run, here I just checked for the presence of ✅ This pipeline has successfully finished 🎉 in the AWS Batch logs.

  • Post-merge: Slack notifications still work as intended.

@victorlin victorlin self-assigned this Jul 31, 2023
@victorlin victorlin force-pushed the victorlin/centralized-ingest-git-subrepo branch 2 times, most recently from 3cffe01 to aafc680 Compare August 11, 2023 13:08
subrepo:
  subdir:   "vendored"
  merged:   "1eb8b30"
upstream:
  origin:   "https://github.com/nextstrain/ingest"
  branch:   "main"
  commit:   "1eb8b30"
git-subrepo:
  version:  "0.4.6"
  origin:   "https://github.com/ingydotnet/git-subrepo"
  commit:   "110b9eb"
The previous commit was created by the following command:

    git subrepo clone https://github.com/nextstrain/ingest vendored

Add a section in the README on how to use this directory in the future.
Remove the copies in this repo and update references.
Both the centralized `trigger` and `trigger-on-new-data` take an
owner/repo pair as the first argument.
Remove the copies in this repo and update references.

Add new positional arguments required by the centralized scripts.
@victorlin victorlin force-pushed the victorlin/centralized-ingest-git-subrepo branch from aafc680 to 6ce947e Compare August 15, 2023 18:49
@victorlin victorlin marked this pull request as ready for review August 22, 2023 17:59
Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@victorlin victorlin merged commit 21b2eb8 into master Aug 24, 2023
3 of 4 checks passed
@victorlin victorlin deleted the victorlin/centralized-ingest-git-subrepo branch August 24, 2023 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

2 participants