Skip to content

Commit

Permalink
Merge branch 'dev' into ds_gwas_update
Browse files Browse the repository at this point in the history
  • Loading branch information
ireneisdoomed committed Mar 6, 2024
2 parents d4dd97e + 9f1bec0 commit 97bf3f0
Show file tree
Hide file tree
Showing 27 changed files with 1,145 additions and 253 deletions.
63 changes: 63 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
version: 1
labels:
- label: "size-XS"
size:
exclude-files: ["poetry.lock"]
below: 10
- label: "size-S"
size:
exclude-files: ["poetry.lock"]
above: 9
below: 100
- label: "size-M"
size:
exclude-files: ["poetry.lock"]
above: 100
below: 500
- label: "size-L"
size:
exclude-files: ["poetry.lock"]
above: 499
below: 1000
- label: "size-XL"
size:
exclude-files: ["poetry.lock"]
above: 999
- label: "airflow"
files:
- "src/airflow/.*"
- label: "Documentation"
files:
- "docs/.*"
- label: "Dataset"
files:
- "src/gentropy/dataset/.*"
- label: "Method"
files:
- "src/gentropy/method/.*"
- label: "Datasource"
files:
- "src/gentropy/datasource/.*"
- label: "Step"
files:
- "src/gentropy/[a-zA-Z]\\w+\\.py"
- label: "Feature"
title: "feat.*"
- label: "Bug"
title: "fix.*"
- label: "Refactor"
title: "refactor.*"
- label: "Chore"
title: "chore.*"
- label: "CI"
title: "ci.*"
- label: "Test"
title: "test.*"
- label: "Documentation"
title: "docs.*"
- label: "Performance"
title: "perf.*"
- label: "Build"
title: "build.*"
- label: "Revert"
title: "revert.*"
28 changes: 28 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
## ✨ Context

<!---
Congratulations! You've made it this far!
Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your contribution.
What's the context for the changes? If the changes are related to a specific issue, please [link](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue) to it:
-->

## 🛠 What does this PR implement

<!--- _Detailed description of the changes introduced, Give examples of the changes you've made in this pull request, include an itemized list if you can and
add diagrams or images if necessary. It'll help the reviewer_ -->

## 🙈 Missing

<!--- If there are things that are requested in the task and were not implemented, list them here -->

## 🚦 Before submitting

- [ ] Do these changes cover one single feature (one change at a time)?
- [ ] Did you read the [contributor guideline](https://opentargets.github.io/gentropy/development/contributing/#contributing-checklist)?
- [ ] Did you make sure to update the documentation with your changes?
- [ ] Did you make sure there is no commented out code in this PR?
- [ ] Did you follow [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/) standards in PR title and commit messages?
- [ ] Did you make sure the branch is up-to-date with the `dev` branch?
- [ ] Did you write any new necessary tests?
- [ ] Did you make sure the changes pass local tests (`make test`)?
- [ ] Did you make sure the changes pass pre-commit rules (e.g `poetry run pre-commit run --all-files`)?
14 changes: 14 additions & 0 deletions .github/workflows/labeler.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: Label PRs

"on":
- pull_request
- issues

jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: srvaroa/labeler@master
env:
GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
1 change: 0 additions & 1 deletion config/datasets/ot_gcp.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ thurman: ${datasets.static_assets}/thurman2012/genomewideCorrs_above0.7_promoter
target_index: ${datasets.release_folder}/targets # OTP 23.12 data

gene_interactions: ${datasets.release_folder}/interaction # OTP 23.12 data
eqtl_catalogue_paths_imported: ${datasets.inputs}/preprocess/eqtl_catalogue/tabix_ftp_paths_imported.tsv
finngen_finemapping_results_path: ${datasets.inputs}/Finngen_susie_finemapping_r10/full
finngen_finemapping_summaries_path: ${datasets.inputs}/Finngen_susie_finemapping_r10/Finngen_susie_credset_summary_r10.tsv

Expand Down
9 changes: 6 additions & 3 deletions config/step/ot_eqtl_catalogue.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
defaults:
- eqtl_catalogue

eqtl_catalogue_paths_imported: ${datasets.eqtl_catalogue_paths_imported}
eqtl_catalogue_study_index_out: ${datasets.eqtl_catalogue_study_index_out}
eqtl_catalogue_summary_stats_out: ${datasets.eqtl_catalogue_summary_stats_out}
eqtl_catalogue_paths_imported: ???
eqtl_catalogue_study_index_out: ???
eqtl_catalogue_credible_sets_out: ???
session:
extended_spark_conf:
"spark.sql.shuffle.partitions": "3200"
8 changes: 2 additions & 6 deletions docs/python_api/datasources/eqtl_catalogue/_eqtl_catalogue.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@
title: eQTL Catalogue
---

The [eQTL Catalogue](https://www.ebi.ac.uk/eqtl/) aims to provide uniformly processed gene expression and splicing Quantitative Trait Loci (QTLs) from all available public studies on humans.
The [eQTL Catalogue](https://www.ebi.ac.uk/eqtl/) aims to provide unified gene, protein expression and splicing Quantitative Trait Loci (QTLs) from all available human public studies.

It serves as the ultimate resource of eQTLs that we use for colocalization and target prioritization.

We utilize data from the following study within the eQTL Catalogue:

1. **GTEx v8**, 49 tissues
It serves as the ultimate resource of mQTLs that we use for colocalization and target prioritization.
5 changes: 5 additions & 0 deletions docs/python_api/datasources/eqtl_catalogue/finemapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
title: Fine mapping results
---

::: gentropy.datasource.eqtl_catalogue.finemapping.EqtlCatalogueFinemapping
2 changes: 1 addition & 1 deletion docs/python_api/steps/eqtl_catalogue.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: eqtl_catalogue
title: eQTL Catalogue
---

::: gentropy.eqtl_catalogue.EqtlCatalogueStep
74 changes: 74 additions & 0 deletions src/airflow/dags/eqtl_preprocess.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
"""Airflow DAG to extract credible sets and a study index from eQTL Catalogue's finemapping results."""

from __future__ import annotations

from pathlib import Path

import common_airflow as common
from airflow.models.dag import DAG
from airflow.providers.google.cloud.operators.dataflow import (
DataflowTemplatedJobStartOperator,
)
from airflow.providers.google.cloud.operators.gcs import GCSDeleteBucketOperator

CLUSTER_NAME = "otg-preprocess-eqtl"
AUTOSCALING = "do-ld-explosion"
PROJECT_ID = "open-targets-genetics-dev"

EQTL_CATALOG_SUSIE_LOCATION = "gs://eqtl_catalog_data/ebi_ftp/susie"
TEMP_DECOMPRESS_LOCATION = "gs://eqtl_catalog_data/susie_decompressed_tmp"
DECOMPRESS_FAILED_LOG = f"{TEMP_DECOMPRESS_LOCATION}.log"
STUDY_INDEX_PATH = "gs://eqtl_catalog_data/study_index"
CREDIBLE_SET_PATH = "gs://eqtl_catalog_data/credible_set_datasets/susie"

with DAG(
dag_id=Path(__file__).stem,
description="Open Targets Genetics — eQTL preprocess",
default_args=common.shared_dag_args,
**common.shared_dag_kwargs,
):
# SuSIE fine mapping results are stored as gzipped files in a GCS bucket.
# To improve processing performance, we decompress the files before processing to a temporary location in GCS.
decompression_job = DataflowTemplatedJobStartOperator(
task_id="decompress_susie_outputs",
template="gs://dataflow-templates/latest/Bulk_Decompress_GCS_Files",
location="europe-west1",
project_id=PROJECT_ID,
parameters={
"inputFilePattern": f"{EQTL_CATALOG_SUSIE_LOCATION}/**/*.gz",
"outputDirectory": TEMP_DECOMPRESS_LOCATION,
"outputFailureFile": DECOMPRESS_FAILED_LOG,
},
)

ingestion_job = common.submit_step(
cluster_name=CLUSTER_NAME,
step_id="ot_eqtl_catalogue",
task_id="ot_eqtl_ingestion",
other_args=[
f"step.eqtl_catalogue_paths_imported={TEMP_DECOMPRESS_LOCATION}",
f"step.eqtl_catalogue_study_index_out={STUDY_INDEX_PATH}",
f"step.eqtl_catalogue_credible_sets_out={CREDIBLE_SET_PATH}",
],
)

delete_decompressed_job = GCSDeleteBucketOperator(
task_id="delete_decompressed_files",
bucket_name=TEMP_DECOMPRESS_LOCATION,
force=True,
user_project=PROJECT_ID,
)

(
decompression_job
>> common.create_cluster(
CLUSTER_NAME,
autoscaling_policy=AUTOSCALING,
num_workers=4,
worker_machine_type="n1-highmem-8",
)
>> common.install_dependencies(CLUSTER_NAME)
>> ingestion_job
>> delete_decompressed_job
>> common.delete_cluster(CLUSTER_NAME)
)
1 change: 1 addition & 0 deletions src/airflow/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
apache-airflow-providers-google==10.10.1
apache-airflow-providers-apache-beam==5.6.1
psycopg2-binary==2.9.9
8 changes: 7 additions & 1 deletion src/gentropy/assets/schemas/study_index.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
{
"name": "traitFromSource",
"type": "string",
"nullable": false,
"nullable": true,
"metadata": {}
},
{
Expand All @@ -41,6 +41,12 @@
"nullable": true,
"metadata": {}
},
{
"name": "tissueFromSourceId",
"type": "string",
"nullable": true,
"metadata": {}
},
{
"name": "pubmedId",
"type": "string",
Expand Down
9 changes: 7 additions & 2 deletions src/gentropy/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,12 @@ class EqtlCatalogueConfig(StepConfig):

eqtl_catalogue_paths_imported: str = MISSING
eqtl_catalogue_study_index_out: str = MISSING
eqtl_catalogue_summary_stats_out: str = MISSING
eqtl_catalogue_credible_sets_out: str = MISSING
mqtl_quantification_methods: list[str] = field(
default_factory=lambda: [
"ge",
]
)
_target_: str = "gentropy.eqtl_catalogue.EqtlCatalogueStep"


Expand Down Expand Up @@ -263,7 +268,7 @@ class VariantAnnotationConfig(StepConfig):
}
)
variant_annotation_path: str = MISSING
_target_: str = "gentropytropy.variant_annotation.VariantAnnotationStep"
_target_: str = "gentropy.variant_annotation.VariantAnnotationStep"


@dataclass
Expand Down
Loading

0 comments on commit 97bf3f0

Please sign in to comment.