-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'dev' into ds_gwas_update
- Loading branch information
Showing
27 changed files
with
1,145 additions
and
253 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
version: 1 | ||
labels: | ||
- label: "size-XS" | ||
size: | ||
exclude-files: ["poetry.lock"] | ||
below: 10 | ||
- label: "size-S" | ||
size: | ||
exclude-files: ["poetry.lock"] | ||
above: 9 | ||
below: 100 | ||
- label: "size-M" | ||
size: | ||
exclude-files: ["poetry.lock"] | ||
above: 100 | ||
below: 500 | ||
- label: "size-L" | ||
size: | ||
exclude-files: ["poetry.lock"] | ||
above: 499 | ||
below: 1000 | ||
- label: "size-XL" | ||
size: | ||
exclude-files: ["poetry.lock"] | ||
above: 999 | ||
- label: "airflow" | ||
files: | ||
- "src/airflow/.*" | ||
- label: "Documentation" | ||
files: | ||
- "docs/.*" | ||
- label: "Dataset" | ||
files: | ||
- "src/gentropy/dataset/.*" | ||
- label: "Method" | ||
files: | ||
- "src/gentropy/method/.*" | ||
- label: "Datasource" | ||
files: | ||
- "src/gentropy/datasource/.*" | ||
- label: "Step" | ||
files: | ||
- "src/gentropy/[a-zA-Z]\\w+\\.py" | ||
- label: "Feature" | ||
title: "feat.*" | ||
- label: "Bug" | ||
title: "fix.*" | ||
- label: "Refactor" | ||
title: "refactor.*" | ||
- label: "Chore" | ||
title: "chore.*" | ||
- label: "CI" | ||
title: "ci.*" | ||
- label: "Test" | ||
title: "test.*" | ||
- label: "Documentation" | ||
title: "docs.*" | ||
- label: "Performance" | ||
title: "perf.*" | ||
- label: "Build" | ||
title: "build.*" | ||
- label: "Revert" | ||
title: "revert.*" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
## ✨ Context | ||
|
||
<!--- | ||
Congratulations! You've made it this far! | ||
Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your contribution. | ||
What's the context for the changes? If the changes are related to a specific issue, please [link](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue) to it: | ||
--> | ||
|
||
## 🛠 What does this PR implement | ||
|
||
<!--- _Detailed description of the changes introduced, Give examples of the changes you've made in this pull request, include an itemized list if you can and | ||
add diagrams or images if necessary. It'll help the reviewer_ --> | ||
|
||
## 🙈 Missing | ||
|
||
<!--- If there are things that are requested in the task and were not implemented, list them here --> | ||
|
||
## 🚦 Before submitting | ||
|
||
- [ ] Do these changes cover one single feature (one change at a time)? | ||
- [ ] Did you read the [contributor guideline](https://opentargets.github.io/gentropy/development/contributing/#contributing-checklist)? | ||
- [ ] Did you make sure to update the documentation with your changes? | ||
- [ ] Did you make sure there is no commented out code in this PR? | ||
- [ ] Did you follow [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/) standards in PR title and commit messages? | ||
- [ ] Did you make sure the branch is up-to-date with the `dev` branch? | ||
- [ ] Did you write any new necessary tests? | ||
- [ ] Did you make sure the changes pass local tests (`make test`)? | ||
- [ ] Did you make sure the changes pass pre-commit rules (e.g `poetry run pre-commit run --all-files`)? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
name: Label PRs | ||
|
||
"on": | ||
- pull_request | ||
- issues | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- uses: srvaroa/labeler@master | ||
env: | ||
GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,9 @@ | ||
defaults: | ||
- eqtl_catalogue | ||
|
||
eqtl_catalogue_paths_imported: ${datasets.eqtl_catalogue_paths_imported} | ||
eqtl_catalogue_study_index_out: ${datasets.eqtl_catalogue_study_index_out} | ||
eqtl_catalogue_summary_stats_out: ${datasets.eqtl_catalogue_summary_stats_out} | ||
eqtl_catalogue_paths_imported: ??? | ||
eqtl_catalogue_study_index_out: ??? | ||
eqtl_catalogue_credible_sets_out: ??? | ||
session: | ||
extended_spark_conf: | ||
"spark.sql.shuffle.partitions": "3200" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
title: Fine mapping results | ||
--- | ||
|
||
::: gentropy.datasource.eqtl_catalogue.finemapping.EqtlCatalogueFinemapping |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
title: eqtl_catalogue | ||
title: eQTL Catalogue | ||
--- | ||
|
||
::: gentropy.eqtl_catalogue.EqtlCatalogueStep |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
"""Airflow DAG to extract credible sets and a study index from eQTL Catalogue's finemapping results.""" | ||
|
||
from __future__ import annotations | ||
|
||
from pathlib import Path | ||
|
||
import common_airflow as common | ||
from airflow.models.dag import DAG | ||
from airflow.providers.google.cloud.operators.dataflow import ( | ||
DataflowTemplatedJobStartOperator, | ||
) | ||
from airflow.providers.google.cloud.operators.gcs import GCSDeleteBucketOperator | ||
|
||
CLUSTER_NAME = "otg-preprocess-eqtl" | ||
AUTOSCALING = "do-ld-explosion" | ||
PROJECT_ID = "open-targets-genetics-dev" | ||
|
||
EQTL_CATALOG_SUSIE_LOCATION = "gs://eqtl_catalog_data/ebi_ftp/susie" | ||
TEMP_DECOMPRESS_LOCATION = "gs://eqtl_catalog_data/susie_decompressed_tmp" | ||
DECOMPRESS_FAILED_LOG = f"{TEMP_DECOMPRESS_LOCATION}.log" | ||
STUDY_INDEX_PATH = "gs://eqtl_catalog_data/study_index" | ||
CREDIBLE_SET_PATH = "gs://eqtl_catalog_data/credible_set_datasets/susie" | ||
|
||
with DAG( | ||
dag_id=Path(__file__).stem, | ||
description="Open Targets Genetics — eQTL preprocess", | ||
default_args=common.shared_dag_args, | ||
**common.shared_dag_kwargs, | ||
): | ||
# SuSIE fine mapping results are stored as gzipped files in a GCS bucket. | ||
# To improve processing performance, we decompress the files before processing to a temporary location in GCS. | ||
decompression_job = DataflowTemplatedJobStartOperator( | ||
task_id="decompress_susie_outputs", | ||
template="gs://dataflow-templates/latest/Bulk_Decompress_GCS_Files", | ||
location="europe-west1", | ||
project_id=PROJECT_ID, | ||
parameters={ | ||
"inputFilePattern": f"{EQTL_CATALOG_SUSIE_LOCATION}/**/*.gz", | ||
"outputDirectory": TEMP_DECOMPRESS_LOCATION, | ||
"outputFailureFile": DECOMPRESS_FAILED_LOG, | ||
}, | ||
) | ||
|
||
ingestion_job = common.submit_step( | ||
cluster_name=CLUSTER_NAME, | ||
step_id="ot_eqtl_catalogue", | ||
task_id="ot_eqtl_ingestion", | ||
other_args=[ | ||
f"step.eqtl_catalogue_paths_imported={TEMP_DECOMPRESS_LOCATION}", | ||
f"step.eqtl_catalogue_study_index_out={STUDY_INDEX_PATH}", | ||
f"step.eqtl_catalogue_credible_sets_out={CREDIBLE_SET_PATH}", | ||
], | ||
) | ||
|
||
delete_decompressed_job = GCSDeleteBucketOperator( | ||
task_id="delete_decompressed_files", | ||
bucket_name=TEMP_DECOMPRESS_LOCATION, | ||
force=True, | ||
user_project=PROJECT_ID, | ||
) | ||
|
||
( | ||
decompression_job | ||
>> common.create_cluster( | ||
CLUSTER_NAME, | ||
autoscaling_policy=AUTOSCALING, | ||
num_workers=4, | ||
worker_machine_type="n1-highmem-8", | ||
) | ||
>> common.install_dependencies(CLUSTER_NAME) | ||
>> ingestion_job | ||
>> delete_decompressed_job | ||
>> common.delete_cluster(CLUSTER_NAME) | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
apache-airflow-providers-google==10.10.1 | ||
apache-airflow-providers-apache-beam==5.6.1 | ||
psycopg2-binary==2.9.9 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.