Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(L2GFeatureMatrix)!: streamline feature matrix management #745

Merged
merged 48 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
50d98ed
refactor(L2GFeatureMatrix): remove schema validation
ireneisdoomed Sep 3, 2024
66a0f0b
Merge branch 'dev' of https://github.com/opentargets/gentropy into il…
ireneisdoomed Sep 3, 2024
e1f7c5c
refactor(FeatureFactory): reshape feature generation WIP
ireneisdoomed Sep 3, 2024
a7757ac
chore: pre-commit auto fixes [...]
pre-commit-ci[bot] Sep 3, 2024
646a810
Merge branch 'dev' of https://github.com/opentargets/gentropy into il…
ireneisdoomed Sep 4, 2024
8a70bf2
chore: set l2gfeature properties with decorator
ireneisdoomed Sep 6, 2024
c690ffc
chore(l2gfeature): make credible_set and input_dependency instance at…
ireneisdoomed Sep 6, 2024
a54e694
chore(l2gfeature): make credible_set and input_dependency instance at…
ireneisdoomed Sep 6, 2024
85a7bf4
chore(featurefactory): distanceTssMeanFeature working
ireneisdoomed Sep 6, 2024
d24de6d
refactor(l2g): improve step dependency management
ireneisdoomed Sep 9, 2024
6a3af69
feat: implement
ireneisdoomed Sep 9, 2024
09d5291
chore: fix mypy issues
ireneisdoomed Sep 9, 2024
6211d8d
Merge branch 'dev' of https://github.com/opentargets/gentropy into il…
ireneisdoomed Sep 9, 2024
5561b74
Merge branch 'il-3252' of https://github.com/opentargets/gentropy int…
ireneisdoomed Sep 9, 2024
b1f607b
feat: l2gfeaturematrix.from_features_list working
ireneisdoomed Sep 9, 2024
021e159
Merge branch 'dev' of https://github.com/opentargets/gentropy into il…
ireneisdoomed Sep 10, 2024
da20073
chore: comment out obsolete refs
ireneisdoomed Sep 10, 2024
d06c059
chore(L2GFeatureMatrix): change `mode` attribute to `with_gold_standard`
ireneisdoomed Sep 10, 2024
0a007a7
refactor(l2g): move feature matrix writing to training module
ireneisdoomed Sep 10, 2024
abfdf22
feat(L2GFeatureMatrix): accept L2GGoldStandard or StudyLocus as inputs
ireneisdoomed Sep 10, 2024
1eed6f3
feat: implement methods to build a feature matrix based on a studyloc…
ireneisdoomed Sep 10, 2024
b4a86a1
feat: coloc logic prototype
ireneisdoomed Sep 10, 2024
0b09193
feat(l2g): filter non gwas credible sets at the start of the step
ireneisdoomed Sep 11, 2024
a60095b
feat: rewrite colocalisation feature factory
ireneisdoomed Sep 13, 2024
16085ad
test: add `test_colocalisation_feature_type`
ireneisdoomed Sep 13, 2024
7ab1ff1
test(colocalisation): add test_extract_maximum_coloc_probability_per_…
ireneisdoomed Sep 13, 2024
e56e8ea
feat(L2GFeatureInputLoader): support multiple deps by passing loader …
ireneisdoomed Sep 13, 2024
b8525ad
test: add integration tests `test_build_feature_matrix`
ireneisdoomed Sep 13, 2024
ad8481e
Merge branch 'dev' of https://github.com/opentargets/gentropy into il…
ireneisdoomed Sep 13, 2024
3fa9b55
Merge branch 'dev' of https://github.com/opentargets/gentropy into il…
ireneisdoomed Sep 18, 2024
95793c6
chore: drop config yamls
ireneisdoomed Sep 18, 2024
cb5c169
refactor: move feature classes to datasets module
ireneisdoomed Sep 18, 2024
d3498b4
docs: update feature docs
ireneisdoomed Sep 18, 2024
ead7288
refactor(colocalisation): cleaner joins in `append_right_study_metadata`
ireneisdoomed Sep 19, 2024
8c95bd4
chore: better logging abstract methods
ireneisdoomed Sep 19, 2024
8e2460e
test: add `L2GFeatureMatrix.test_from_features_list` unit tests
ireneisdoomed Sep 20, 2024
f9d9fd4
fix: add goldStandardSet when a gs instance is passed to `from_featur…
ireneisdoomed Sep 20, 2024
5b21367
fix: lowercase colocalisation type and add semantic test
ireneisdoomed Sep 20, 2024
25e0c45
test: add semantic test for `append_right_study_metadata`
ireneisdoomed Sep 20, 2024
a322b7c
feat(colocalisation): make `append_right_study_metadata` extensible t…
ireneisdoomed Sep 20, 2024
7da2102
fix(colocalisation): append_study_metadata cant take a gold standard
ireneisdoomed Sep 20, 2024
a25e66e
fix(colocalisation): extract_maximum_coloc_probability_per_region_and…
ireneisdoomed Sep 23, 2024
3d463d9
feat: add `StudyLocus` as a dependency of colocalisation features
ireneisdoomed Sep 23, 2024
e889be6
Merge branch 'dev' of https://github.com/opentargets/gentropy into il…
ireneisdoomed Sep 23, 2024
80b62dd
fix: add studylocus to input loader in test
ireneisdoomed Sep 23, 2024
d863a33
fix: add studylocus to input loader in test
ireneisdoomed Sep 23, 2024
b17c538
fix: add studylocus to input loader in test
ireneisdoomed Sep 23, 2024
0675972
Merge branch 'dev' of https://github.com/opentargets/gentropy into il…
ireneisdoomed Sep 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion docs/python_api/datasets/l2g_feature.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,27 @@
title: L2G Feature
---

::: gentropy.method.l2g.feature_factory.L2GFeature
## Abstract Class

::: gentropy.dataset.l2g_feature.L2GFeature

## Feature Classes

### Derived from colocalisation

::: gentropy.dataset.l2g_feature.EQtlColocClppMaximumFeature
::: gentropy.dataset.l2g_feature.PQtlColocClppMaximumFeature
::: gentropy.dataset.l2g_feature.SQtlColocClppMaximumFeature
::: gentropy.dataset.l2g_feature.TuQtlColocClppMaximumFeature
::: gentropy.dataset.l2g_feature.EQtlColocH4MaximumFeature
::: gentropy.dataset.l2g_feature.PQtlColocH4MaximumFeature
::: gentropy.dataset.l2g_feature.SQtlColocH4MaximumFeature
::: gentropy.dataset.l2g_feature.TuQtlColocH4MaximumFeature

### Derived from distance

::: gentropy.dataset.l2g_feature.DistanceTssMinimumFeature
::: gentropy.dataset.l2g_feature.DistanceTssMeanFeature

## Schema

Expand Down
4 changes: 2 additions & 2 deletions docs/python_api/methods/l2g/feature_factory.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
title: L2G Feature Factory
---

::: gentropy.method.l2g.feature_factory.ColocalisationFactory
::: gentropy.method.l2g.feature_factory.FeatureFactory

::: gentropy.method.l2g.feature_factory.StudyLocusFactory
::: gentropy.method.l2g.feature_factory.L2GFeatureInputLoader
155 changes: 0 additions & 155 deletions src/gentropy/assets/schemas/l2g_feature_matrix.json

This file was deleted.

55 changes: 11 additions & 44 deletions src/gentropy/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,50 +227,16 @@ class LocusToGeneConfig(StepConfig):
gene_interactions_path: str | None = None
features_list: list[str] = field(
default_factory=lambda: [
# average distance of all tagging variants to gene TSS
"distanceTssMean",
# minimum distance of all tagging variants to gene TSS
"distanceTssMinimum",
# maximum vep consequence score of the locus 95% credible set among all genes in the vicinity
"vepMaximumNeighborhood",
# maximum vep consequence score of the locus 95% credible set split by gene
"vepMaximum",
# mean vep consequence score of the locus 95% credible set among all genes in the vicinity
"vepMeanNeighborhood",
# mean vep consequence score of the locus 95% credible set split by gene
"vepMean",
# max clpp for each (study, locus, gene) aggregating over all eQTLs
"eqtlColocClppMaximum",
# max clpp for each (study, locus) aggregating over all eQTLs
"eqtlColocClppMaximumNeighborhood",
# max clpp for each (study, locus, gene) aggregating over all pQTLs
"pqtlColocClppMaximum",
# max clpp for each (study, locus) aggregating over all pQTLs
"pqtlColocClppMaximumNeighborhood",
# max clpp for each (study, locus, gene) aggregating over all sQTLs
"sqtlColocClppMaximum",
# max clpp for each (study, locus) aggregating over all sQTLs
"sqtlColocClppMaximumNeighborhood",
# max clpp for each (study, locus) aggregating over all tuQTLs
"tuqtlColocClppMaximum",
# max clpp for each (study, locus, gene) aggregating over all tuQTLs
"tuqtlColocClppMaximumNeighborhood",
# max log-likelihood ratio value for each (study, locus, gene) aggregating over all eQTLs
"eqtlColocLlrMaximum",
# max log-likelihood ratio value for each (study, locus) aggregating over all eQTLs
"eqtlColocLlrMaximumNeighborhood",
# max log-likelihood ratio value for each (study, locus, gene) aggregating over all pQTLs
"pqtlColocLlrMaximum",
# max log-likelihood ratio value for each (study, locus) aggregating over all pQTLs
"pqtlColocLlrMaximumNeighborhood",
# max log-likelihood ratio value for each (study, locus, gene) aggregating over all sQTLs
"sqtlColocLlrMaximum",
# max log-likelihood ratio value for each (study, locus) aggregating over all sQTLs
"sqtlColocLlrMaximumNeighborhood",
# max log-likelihood ratio value for each (study, locus, gene) aggregating over all tuQTLs
"tuqtlColocLlrMaximum",
# max log-likelihood ratio value for each (study, locus) aggregating over all tuQTLs
"tuqtlColocLlrMaximumNeighborhood",
# max CLPP for each (study, locus, gene) aggregating over a specific qtl type
"eQtlColocClppMaximum",
"pQtlColocClppMaximum",
"sQtlColocClppMaximum",
"tuQtlColocClppMaximum",
# max H4 for each (study, locus, gene) aggregating over a specific qtl type
"eQtlColocH4Maximum",
"pQtlColocH4Maximum",
"sQtlColocH4Maximum",
"tuQtlColocH4Maximum",
]
)
hyperparameters: dict[str, Any] = field(
Expand All @@ -283,6 +249,7 @@ class LocusToGeneConfig(StepConfig):
wandb_run_name: str | None = None
hf_hub_repo_id: str | None = "opentargets/locus_to_gene"
download_from_hub: bool = True
write_feature_matrix: bool = True
_target_: str = "gentropy.l2g.LocusToGeneStep"


Expand Down
Loading
Loading