Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use nextstrain/ingest #412

Merged
merged 6 commits into from
Aug 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/fetch-and-ingest-genbank-master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ on:
# sister GISAID job, so that we don't need to keep two schedules in our heads.
- cron: '7 18 * * *'

# Manually triggered using `./bin/trigger ncov-ingest genbank/fetch-and-ingest` (or `fetch-and-ingest`, which
# Manually triggered using `./vendored/trigger nextstrain/ncov-ingest genbank/fetch-and-ingest` (or `fetch-and-ingest`, which
# includes GISAID)
repository_dispatch:
types:
Expand Down Expand Up @@ -82,4 +82,4 @@ jobs:

- name: notify_pipeline_failed
if: ${{ failure() }}
run: ./bin/notify-on-job-fail
run: ./vendored/notify-on-job-fail Ingest nextstrain/ncov-ingest
4 changes: 2 additions & 2 deletions .github/workflows/fetch-and-ingest-gisaid-master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ on:
# sister GenBank job, so that we don't need to keep two schedules in our heads.
- cron: '7 18 * * *'

# Manually triggered using `./bin/trigger ncov-ingest gisaid/fetch-and-ingest`
# Manually triggered using `./vendored/trigger nextstrain/ncov-ingest gisaid/fetch-and-ingest`
repository_dispatch:
types:
- gisaid/fetch-and-ingest
Expand Down Expand Up @@ -85,4 +85,4 @@ jobs:

- name: notify_pipeline_failed
if: ${{ failure() }}
run: ./bin/notify-on-job-fail
run: ./vendored/notify-on-job-fail Ingest nextstrain/ncov-ingest
4 changes: 2 additions & 2 deletions .github/workflows/ingest-genbank-master.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: GenBank ingest

on:
# Manually triggered using `./bin/trigger ncov-ingest genbank/ingest` (or `ingest`, which
# Manually triggered using `./vendored/trigger nextstrain/ncov-ingest genbank/ingest` (or `ingest`, which
# includes GISAID)
repository_dispatch:
types:
Expand Down Expand Up @@ -51,4 +51,4 @@ jobs:

- name: notify_pipeline_failed
if: ${{ failure() }}
run: ./bin/notify-on-job-fail
run: ./vendored/notify-on-job-fail Ingest nextstrain/ncov-ingest
4 changes: 2 additions & 2 deletions .github/workflows/ingest-gisaid-master.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: GISAID ingest

on:
# Manually triggered using `./bin/trigger ncov-ingest gisaid/ingest` (or `ingest`, which
# Manually triggered using `./vendored/trigger nextstrain/ncov-ingest gisaid/ingest` (or `ingest`, which
# includes GenBank)
repository_dispatch:
types:
Expand Down Expand Up @@ -51,4 +51,4 @@ jobs:

- name: notify_pipeline_failed
if: ${{ failure() }}
run: ./bin/notify-on-job-fail
run: ./vendored/notify-on-job-fail Ingest nextstrain/ncov-ingest
2 changes: 1 addition & 1 deletion .github/workflows/update-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ on:
- yarn.lock
- .github/workflows/update-image.yml

# Manually triggered using `./bin/trigger ncov-ingest update-image`
# Manually triggered using `./vendored/trigger nextstrain/ncov-ingest update-image`
repository_dispatch:
types: update-image

Expand Down
25 changes: 19 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,18 +76,18 @@ AWS credentials are stored in this repository's secrets and are associated with

A full run is now done in 3 steps via manual triggers:

1. Fetch new sequences and ingest them by running `./bin/trigger ncov-ingest gisaid/fetch-and-ingest --user <your-github-username>`.
1. Fetch new sequences and ingest them by running `./vendored/trigger nextstrain/ncov-ingest gisaid/fetch-and-ingest --user <your-github-username>`.
2. Add manual annotations, update location hierarchy as needed, and run ingest without fetching new sequences.
- Pushes of `source-data/*-annotations.tsv` to the master branch will automatically trigger a run of ingest.
- You can also run ingest manually by running `./bin/trigger ncov-ingest gisaid/ingest --user <your-github-username>`.
3. Once all manual fixes are complete, trigger a rebuild of [nextstrain/ncov](https://github.com/nextstrain/ncov) by running `./bin/trigger ncov gisaid/rebuild --user <your-github-username>`.
- You can also run ingest manually by running `./vendored/trigger nextstrain/ncov-ingest gisaid/ingest --user <your-github-username>`.
3. Once all manual fixes are complete, trigger a rebuild of [nextstrain/ncov](https://github.com/nextstrain/ncov) by running `./vendored/trigger ncov gisaid/rebuild --user <your-github-username>`.

See the output of `./bin/trigger ncov-ingest gisaid/fetch-and-ingest --user <your-github-username>`, `./bin/trigger ncov-ingest gisaid/ingest` or `./bin/trigger ncov-ingest rebuild` for more information about authentication with GitHub.
See the output of `./vendored/trigger nextstrain/ncov-ingest gisaid/fetch-and-ingest --user <your-github-username>`, `./vendored/trigger nextstrain/ncov-ingest gisaid/ingest` or `./vendored/trigger nextstrain/ncov-ingest rebuild` for more information about authentication with GitHub.

Note: running `./bin/trigger ncov-ingest` posts a GitHub `repository_dispatch`.
Note: running `./vendored/trigger nextstrain/ncov-ingest` posts a GitHub `repository_dispatch`.
Regardless of which branch you are on, it will trigger the specified action on the master branch.

Valid dispatch types for `./bin/trigger ncov-ingest` are:
Valid dispatch types for `./vendored/trigger nextstrain/ncov-ingest` are:

- `ingest` (both GISAID and GenBank)
- `gisaid/ingest`
Expand Down Expand Up @@ -150,3 +150,16 @@ aws s3 cp - s3://nextstrain-data/files/ncov/open/nextclade_21L.tsv.zst.renew < /
- `AWS_SECRET_ACCESS_KEY`
- `SLACK_TOKEN`
- `SLACK_CHANNELS`

## `vendored`

This repository uses [`git subrepo`](https://github.com/ingydotnet/git-subrepo) to manage copies of ingest scripts in `vendored`, from [nextstrain/ingest](https://github.com/nextstrain/ingest). To pull new changes from the central ingest repository, first install `git subrepo`, then run:

```sh
git subrepo pull vendored
```

Changes should not be pushed using `git subrepo push`.

1. For pathogen-specific changes, make them in this repository via a pull request.
2. For pathogen-agnostic changes, make them on [nextstrain/ingest](https://github.com/nextstrain/ingest) via pull request there, then use `git subrepo pull` to add those changes to this repository.
4 changes: 2 additions & 2 deletions Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ onstart:
print(f"\t${{{var}}}: " + ("YES" if os.environ.get(var, "") else "NO") + f"({description})")
if send_notifications:
message="🥗 GISAID ingest" if database=="gisaid" else "🥣 GenBank ingest"
shell(f"./bin/notify-on-job-start \"{message}\"")
shell(f"./vendored/notify-on-job-start \"{message}\" nextstrain/ncov-ingest")

onsuccess:
message = "✅ This pipeline has successfully finished 🎉"
Expand All @@ -104,7 +104,7 @@ onsuccess:
onerror:
print("Pipeline failed.")
if send_notifications:
shell("./bin/notify-on-job-fail")
shell("./vendored/notify-on-job-fail Ingest nextstrain/ncov-ingest")
if not config.get("keep_all_files", False):
print("Removing intermediate files (set config option keep_all_files to skip this)")
shell("./bin/clean")
20 changes: 10 additions & 10 deletions bin/local-ingest-gisaid
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,8 @@ main() {
download-inputs() {
mkdir -p "${INPUT_DIR}"

./bin/download-from-s3 "${S3_BUCKET}/additional_info.tsv.gz" "data/gisaid/inputs/additional_info.tsv"
./bin/download-from-s3 "${S3_BUCKET}/metadata.tsv.gz" "data/gisaid/inputs/metadata.tsv"
./vendored/download-from-s3 "${S3_BUCKET}/additional_info.tsv.gz" "data/gisaid/inputs/additional_info.tsv"
./vendored/download-from-s3 "${S3_BUCKET}/metadata.tsv.gz" "data/gisaid/inputs/metadata.tsv"
}

download-gisaid() {
Expand Down Expand Up @@ -160,16 +160,16 @@ ingest() {
}

upload-outputs() {
./bin/upload-to-s3 "${OUTPUT_DIR}/metadata.tsv" "${S3_BUCKET}/metadata.tsv.gz"
./bin/upload-to-s3 "${OUTPUT_DIR}/additional_info.tsv" "${S3_BUCKET}/additional_info.tsv.gz"
./bin/upload-to-s3 "${OUTPUT_DIR}/flagged_metadata.txt" "${S3_BUCKET}/flagged_metadata.txt.gz"
./bin/upload-to-s3 "${OUTPUT_DIR}/sequences.fasta" "${S3_BUCKET}/sequences.fasta.xz"
./vendored/upload-to-s3 "${OUTPUT_DIR}/metadata.tsv" "${S3_BUCKET}/metadata.tsv.gz"
./vendored/upload-to-s3 "${OUTPUT_DIR}/additional_info.tsv" "${S3_BUCKET}/additional_info.tsv.gz"
./vendored/upload-to-s3 "${OUTPUT_DIR}/flagged_metadata.txt" "${S3_BUCKET}/flagged_metadata.txt.gz"
./vendored/upload-to-s3 "${OUTPUT_DIR}/sequences.fasta" "${S3_BUCKET}/sequences.fasta.xz"

# Parallel uploads of zstd compressed files to slowly transition to this format
./bin/upload-to-s3 "${OUTPUT_DIR}/metadata.tsv" "${S3_BUCKET}/metadata.tsv.zst"
./bin/upload-to-s3 "${OUTPUT_DIR}/additional_info.tsv" "${S3_BUCKET}/additional_info.tsv.zst"
./bin/upload-to-s3 "${OUTPUT_DIR}/flagged_metadata.txt" "${S3_BUCKET}/flagged_metadata.txt.zst"
./bin/upload-to-s3 "${OUTPUT_DIR}/sequences.fasta" "${S3_BUCKET}/sequences.fasta.zst"
./vendored/upload-to-s3 "${OUTPUT_DIR}/metadata.tsv" "${S3_BUCKET}/metadata.tsv.zst"
./vendored/upload-to-s3 "${OUTPUT_DIR}/additional_info.tsv" "${S3_BUCKET}/additional_info.tsv.zst"
./vendored/upload-to-s3 "${OUTPUT_DIR}/flagged_metadata.txt" "${S3_BUCKET}/flagged_metadata.txt.zst"
./vendored/upload-to-s3 "${OUTPUT_DIR}/sequences.fasta" "${S3_BUCKET}/sequences.fasta.zst"
}

print-help() {
Expand Down
6 changes: 3 additions & 3 deletions bin/notify-on-additional-info-change
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ set -euo pipefail
: "${SLACK_TOKEN:?The SLACK_TOKEN environment variable is required.}"
: "${SLACK_CHANNELS:?The SLACK_CHANNELS environment variable is required.}"

bin="$(dirname "$0")"
vendored="$(dirname "$0")"/../vendored

src="${1:?A source additional info TSV file is required as the first argument.}"
dst="${2:?A destination additional info TSV s3:// URL is required as the second argument.}"

# if the file is not already present, just exit
"$bin"/s3-object-exists "$dst" || exit 0
"$vendored"/s3-object-exists "$dst" || exit 0

# Remove rows where columns 3 (additional_host_info) and 4 (additional_location_info) are empty.
# Compare the S3 version with the local version.
Expand All @@ -26,7 +26,7 @@ diff="$(

if [[ -n "$diff" ]]; then
echo "Notifying Slack about additional info change."
"$bin"/notify-slack --upload "additional-info-changes.txt" <<<"$diff"
"$vendored"/notify-slack --upload "additional-info-changes.txt" <<<"$diff"
else
echo "No additional info change."
fi
10 changes: 5 additions & 5 deletions bin/notify-on-duplicate-biosample-change
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ set -euo pipefail
: "${SLACK_TOKEN:?The SLACK_TOKEN environment variable is required.}"
: "${SLACK_CHANNELS:?The SLACK_CHANNELS environment variable is required.}"

bin="$(dirname "$0")"
vendored="$(dirname "$0")"/../vendored

src="${1:?A source duplicate BioSample txt file is required as the first argument.}"
dst="${2:?A destination duplicate BioSample txt s3:// URL is required as the second argument.}"
Expand All @@ -16,9 +16,9 @@ diff="$(mktemp -t duplicate-biosample-additions-XXXXXX)"
trap "rm -f '$dst_local' '$diff'" EXIT

# if the file is not already present, just exit
"$bin"/s3-object-exists "$dst" || exit 0
"$vendored"/s3-object-exists "$dst" || exit 0

"$bin"/download-from-s3 "$dst" "$dst_local"
"$vendored"/download-from-s3 "$dst" "$dst_local"

comm -13 \
<(sort "$dst_local") \
Expand All @@ -28,8 +28,8 @@ comm -13 \
if [[ -s "$diff" ]]; then
echo
echo "Notifying Slack about duplicate BioSample additions."
"$bin"/notify-slack ":warning: Newly flagged duplicate BioSample strains"
"$bin"/notify-slack --upload "duplicate-biosample-additions.txt" < "$diff"
"$vendored"/notify-slack ":warning: Newly flagged duplicate BioSample strains"
"$vendored"/notify-slack --upload "duplicate-biosample-additions.txt" < "$diff"
else
echo "No flagged duplicate BioSample additions."
fi
10 changes: 5 additions & 5 deletions bin/notify-on-flagged-metadata-change
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ set -euo pipefail
: "${SLACK_TOKEN:?The SLACK_TOKEN environment variable is required.}"
: "${SLACK_CHANNELS:?The SLACK_CHANNELS environment variable is required.}"

bin="$(dirname "$0")"
vendored="$(dirname "$0")"/../vendored

src="${1:?A source flagged metadata txt file is required as the first argument.}"
dst="${2:?A destination flagged metadata txt s3:// URL is required as the second argument.}"
Expand All @@ -16,9 +16,9 @@ diff="$(mktemp -t flagged-metadata-additions-XXXXXX)"
trap "rm -f '$dst_local' '$diff'" EXIT

# if the file is not already present, just exit
"$bin"/s3-object-exists "$dst" || exit 0
"$vendored"/s3-object-exists "$dst" || exit 0

"$bin"/download-from-s3 "$dst" "$dst_local"
"$vendored"/download-from-s3 "$dst" "$dst_local"

comm -13 \
<(sort "$dst_local") \
Expand All @@ -28,8 +28,8 @@ comm -13 \
if [[ -s "$diff" ]]; then
echo
echo "Notifying Slack about flagged metadata additions."
"$bin"/notify-slack ":waving_black_flag: Newly flagged metadata"
"$bin"/notify-slack --upload "flagged-metadata-additions.txt" < "$diff"
"$vendored"/notify-slack ":waving_black_flag: Newly flagged metadata"
"$vendored"/notify-slack --upload "flagged-metadata-additions.txt" < "$diff"
else
echo "No flagged metadata additions."
fi
21 changes: 0 additions & 21 deletions bin/notify-on-job-fail

This file was deleted.

26 changes: 0 additions & 26 deletions bin/notify-on-job-start

This file was deleted.

4 changes: 2 additions & 2 deletions bin/notify-on-problem-data
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ set -euo pipefail
: "${SLACK_TOKEN:?The SLACK_TOKEN environment variable is required.}"
: "${SLACK_CHANNELS:?The SLACK_CHANNELS environment variable is required.}"

bin="$(dirname "$0")"
vendored="$(dirname "$0")"/../vendored

problem_data="${1:?A problem data TSV file is required as the first argument.}"

if [[ -s "$problem_data" ]]; then
echo "Notifying Slack about problem data."
"$bin"/notify-slack --upload "genbank-problem-data.tsv" < "$problem_data"
"$vendored"/notify-slack --upload "genbank-problem-data.tsv" < "$problem_data"
else
echo "No problem data found."
fi
16 changes: 16 additions & 0 deletions vendored/.github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
### Description of proposed changes

<!-- What is the goal of this pull request? What does this pull request change? -->

### Related issue(s)

<!-- Link any related issues here. -->

### Checklist

<!-- Make sure checks are successful at the bottom of the PR. -->

- [ ] Checks pass
- [ ] If adding a script, add an entry for it in the README.

<!-- 🙌 Thank you for contributing to Nextstrain! ✨ -->
13 changes: 13 additions & 0 deletions vendored/.github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: CI

on:
- push
- pull_request
- workflow_dispatch

jobs:
shellcheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: nextstrain/.github/actions/shellcheck@master
12 changes: 12 additions & 0 deletions vendored/.gitrepo
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
; DO NOT EDIT (unless you know what you are doing)
;
; This subdirectory is a git "subrepo", and this file is maintained by the
; git-subrepo command. See https://github.com/ingydotnet/git-subrepo#readme
;
[subrepo]
remote = https://github.com/nextstrain/ingest
branch = main
commit = 1eb8b30428d5f66adac201f0a246a7ab4bdc9792
parent = 6fd5a9b1d87e59fab35173dbedf376632154943b
method = merge
cmdver = 0.4.6
6 changes: 6 additions & 0 deletions vendored/.shellcheckrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Use of this file requires Shellcheck v0.7.0 or newer.
#
# SC2064 - We intentionally want variables to expand immediately within traps
# so the trap can not fail due to variable interpolation later.
#
disable=SC2064
Loading
Loading