Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate ingest and phylogenetic workflows #22

Merged
merged 8 commits into from
Apr 10, 2024
Merged

Commits on Apr 5, 2024

  1. Start GH Action workflow for automation

    Currently just runs the ingest workflow and uploads the results
    to AWS S3. Subsequent commits will add automation for the phylogenetic
    workflow.
    
    Follows Zika PR #52
    
    nextstrain/zika@d44f2ae
    kimandrews committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    1c33f89 View commit details
    Browse the repository at this point in the history
  2. ingest-to-phylogenetic: Add phylogenetic job

    The phylogenetic workflow will run after the ingest workflow has
    completed successfully to use the latest available data.
    
    Subsequent commits will check if the ingest results included new
    data to only run the phylogenetic workflow when there's new data.
    
    Following Zika PR #52
    
    nextstrain/zika@2c415e7
    kimandrews committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    cf39221 View commit details
    Browse the repository at this point in the history
  3. ingest-to-phylogenetic: Use cache to check new data

    Uses GitHub Actions cache to store a file that contains the
    `Metadata.sh256sum` of the ingest files on S3 and use
    the `hashFiles` function to create a unique cache key.
    
    Then the existence of the cache key is an indicator that the ingest
    file contents have not been updated since a previous run on GH Actions.
    This does come with a big caveat that GH will remove any cache entries
    that have not been accessed in over 7 days.¹ If the workflow is not
    being automatically run within 7 days, then it will always run the
    phylogenetic job.
    
    If this works well, then we may want to consider moving this within
    the `pathogen-repo-build` reusable workflow to have the same
    functionality across pathogen automation workflows.
    
    ¹ https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
    
    Follows Zika PR #52
    
    nextstrain/zika@eb5e76d
    kimandrews committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    2a28964 View commit details
    Browse the repository at this point in the history
  4. ingest-to-phylo: Add inputs for Docker image

    Add individuals inputs per workflow to override the default Docker image
    used by `nextstrain build`. Having this input has been extremely helpful
    to continue running pathogen workflows when we run into new bugs that
    are not present in older nextstrain-base images.
    
    There are separate image inputs for the two workflows because they use
    different tools and may require different versions of images.
    
    Follows Zika PR #52
    
    nextstrain/zika@65a8acc
    kimandrews committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    bb1abd1 View commit details
    Browse the repository at this point in the history
  5. ingest-to-phylo: Add schedule

    Copied daily schedule of mpox ingest
    https://github.com/nextstrain/mpox/blob/e439235ff1c1d66e7285b774e9536e2896d9cd2f/.github/workflows/fetch-and-ingest.yaml#L4-L21
    
    Daily runs seem fine since the ingest workflow currently takes less
    than 2 minutes to complete and it will not trigger the phylogenetic
    workflow if there's no new data.
    
    We can bring this down to once a week if it seems like overkill.
    
    Follows Zika PR #52
    
    nextstrain/zika@77ca1d4
    kimandrews committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    ce7f5bc View commit details
    Browse the repository at this point in the history

Commits on Apr 10, 2024

  1. Configuration menu
    Copy the full SHA
    ec566f9 View commit details
    Browse the repository at this point in the history
  2. ingest-to-phylogenetic: Add AWS_DEFAULT_REGION

    Follows Zika PR #52
    nextstrain/zika@f615170
    
    Uses the variable `AWS_DEFAULT_REGION` that was added to the
    Nextstrain GitHub organization variables.¹
    
    ¹ https://github.com/organizations/nextstrain/settings/variables/actions
    kimandrews committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    671df77 View commit details
    Browse the repository at this point in the history
  3. update Changelog

    kimandrews committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    0d42723 View commit details
    Browse the repository at this point in the history