feat: step to export disease/target evidence #867

DSuveges · 2024-10-22T12:47:10Z

✨ Context

This PR implements the following:

Evidence export method added to l2g dataset.
Step to export evidence.
Configuration for the step.

…tropy into ds_evidence_step

ireneisdoomed

Thank you Daniel!
I have some questions, please have a look

ireneisdoomed · 2024-10-23T10:22:44Z

src/gentropy/config.py

+class LocusToGeneEvidenceStepConfig(StepConfig):
+    """Configuration of the locus to gene evidence step."""
+
+    session: Any = field(


Do you need to specify the session in the config? Hail is set to false by default

ireneisdoomed · 2024-10-23T10:29:41Z

src/gentropy/dataset/l2g_prediction.py

+        study_locus: StudyLocus,
+        study_index: StudyIndex,
+        l2g_threshold: float = 0.05,
+    ) -> DataFrame:


Virtually this is another source of evidence generation, and it is not going to go through validation unlike the others.
Maybe it's annoying, but I think it'd be nice to include a new data model.

I was thinking about this... and not convinced. Evidence doesn't have any downstream application, no methods are called on evidence dataset... Especially considering we only have five or so columns, of which two are constant. I'm quite inclined not to have an other dataset. Later maybe... eg. in case if the generation of the VEP input requires it.

ireneisdoomed · 2024-10-23T10:33:29Z

src/gentropy/dataset/l2g_prediction.py

+
+        return (
+            self.df.filter(f.col("score") >= l2g_threshold)
+            .join(study_locus.df, on="studyLocusId", how="inner")


To make the join lighter

Suggested change

.join(study_locus.df, on="studyLocusId", how="inner")

.join(study_locus.df.select("studyLocusId", "studyId"), on="studyLocusId", how="inner")

I suspect spark optimizes this in the background, but it's better to be more explicit.

ireneisdoomed · 2024-10-23T10:36:58Z

src/gentropy/dataset/l2g_prediction.py

+        return (
+            self.df.filter(f.col("score") >= l2g_threshold)
+            .join(study_locus.df, on="studyLocusId", how="inner")
+            .join(study_index.df.drop("geneId"), on="studyId", how="inner")


To make the join lighter

Suggested change

.join(study_index.df.drop("geneId"), on="studyId", how="inner")

.join(study_index.df.select("studyId", "diseaseIds"), on="studyId", how="inner")

ireneisdoomed · 2024-10-23T10:37:50Z

src/gentropy/dataset/l2g_prediction.py

+                f.col("geneId").alias("targetFromSourceId"),
+                f.explode(f.col("diseaseIds")).alias("diseaseFromSourceMappedId"),
+                f.col("score").alias("resourceScore"),
+                f.col("studyLocusId").alias("studyLocusId"),


Redundant?

Suggested change

f.col("studyLocusId").alias("studyLocusId"),

f.col("studyLocusId"),

ireneisdoomed · 2024-10-23T10:41:19Z

src/gentropy/l2g.py

+        credible_set_path: str,
+        study_index_path: str,
+        evidence_output_path: str,
+        locus_to_gene_threshold: float = LocusToGeneEvidenceStepConfig().locus_to_gene_threshold,


I think i know now why you are specifying the session to the config.I had this scenario before. And if you do LocusToGeneEvidenceStepConfig(), you are going to trigger Spark to get or create (in this case get) the session, which is prob not desirable. Because of that, I suggest not to get the default from the config

Yes, exactly.

So do you agree that the action here is not to pull the parameter from the config and remove the session reference?

in case of the float you could not evaluate the LocusToGeneEvidenceStepConfig. Primitive types are defined at compile time if I am not mistaken.

ireneisdoomed · 2024-10-23T10:43:09Z

src/gentropy/l2g.py

+                credible_sets, study_index, locus_to_gene_threshold
+            )
+            .write.mode(session.write_mode)
+            .json(evidence_output_path)


JSON? Is the plan to add a step in the evidence parsers that pulls this dataset and validate it there?

No. The evidence parsers are not expected to touch this dataset, however the platform etl, at its current form, ingests evidence as json. We can make is smarter later, but let's do one step at a time.

…tropy into ds_evidence_step

ireneisdoomed

Thank you! Last request: please reference this new step in docs/python_api/steps/l2g.md

feature(l2g): step to export disease/target evidence

e1bcd0e

github-actions bot added Dataset Step Feature labels Oct 22, 2024

Merge branch 'dev' into ds_evidence_step

d0e3084

DSuveges changed the title ~~feature(l2g): step to export disease/target evidence~~ feat(l2g): step to export disease/target evidence Oct 22, 2024

DSuveges changed the title ~~feat(l2g): step to export disease/target evidence~~ feat: step to export disease/target evidence Oct 22, 2024

DSuveges added 2 commits October 22, 2024 14:16

fix: sorting out typo in function docstring

40f4b4b

Merge branch 'ds_evidence_step' of https://github.com/opentargets/gen…

1b38a39

…tropy into ds_evidence_step

DSuveges requested a review from ireneisdoomed October 22, 2024 13:24

DSuveges linked an issue Oct 22, 2024 that may be closed by this pull request

Generate gentropy-based L2G evidence opentargets/issues#3453

Closed

DSuveges added 2 commits October 23, 2024 11:24

fix: evidence is written as json

f1dcf31

Merge branch 'dev' into ds_evidence_step

c936098

ireneisdoomed requested changes Oct 23, 2024

View reviewed changes

DSuveges added 3 commits October 24, 2024 10:22

fix: addressing reviewer comments

10e0c8e

Merge branch 'ds_evidence_step' of https://github.com/opentargets/gen…

1082886

…tropy into ds_evidence_step

Merge branch 'dev' into ds_evidence_step

8f245de

github-actions bot added the size-M label Oct 24, 2024

DSuveges requested a review from ireneisdoomed October 24, 2024 09:25

ireneisdoomed approved these changes Oct 24, 2024

View reviewed changes

docs: adding step documentation

427e727

github-actions bot added the documentation Improvements or additions to documentation label Oct 24, 2024

pre-commit-ci bot and others added 4 commits October 24, 2024 09:46

chore: pre-commit auto fixes [...]

7eaecc8

fix: removing default value from step definition

0146805

chore: merge from origin

7bc6acc

Merge branch 'dev' into ds_evidence_step

8090a2a

DSuveges merged commit d4b91d6 into dev Oct 24, 2024
5 checks passed

DSuveges deleted the ds_evidence_step branch October 24, 2024 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: step to export disease/target evidence #867

feat: step to export disease/target evidence #867

DSuveges commented Oct 22, 2024

ireneisdoomed left a comment

ireneisdoomed Oct 23, 2024

ireneisdoomed Oct 23, 2024

DSuveges Oct 23, 2024

ireneisdoomed Oct 23, 2024

DSuveges Oct 23, 2024

ireneisdoomed Oct 23, 2024

DSuveges Oct 23, 2024

ireneisdoomed Oct 23, 2024

ireneisdoomed Oct 23, 2024

DSuveges Oct 23, 2024

ireneisdoomed Oct 24, 2024

project-defiant Oct 24, 2024

ireneisdoomed Oct 23, 2024

DSuveges Oct 23, 2024

ireneisdoomed left a comment

	.join(study_locus.df, on="studyLocusId", how="inner")
	.join(study_locus.df.select("studyLocusId", "studyId"), on="studyLocusId", how="inner")

	.join(study_index.df.drop("geneId"), on="studyId", how="inner")
	.join(study_index.df.select("studyId", "diseaseIds"), on="studyId", how="inner")

	f.col("studyLocusId").alias("studyLocusId"),
	f.col("studyLocusId"),

feat: step to export disease/target evidence #867

feat: step to export disease/target evidence #867

Conversation

DSuveges commented Oct 22, 2024

✨ Context

ireneisdoomed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ireneisdoomed left a comment

Choose a reason for hiding this comment