Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add step to generate association data #888

Merged
merged 8 commits into from
Nov 1, 2024
Merged

Conversation

vivienho
Copy link
Contributor

@vivienho vivienho commented Oct 31, 2024

✨ Context

We want to generate assocations from l2g evidence without relying on the platform etl.

🛠 What does this PR implement

This PR adds a step to generate direct and indirect associations from l2g evidence and saves them as parquet files.

🙈 Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@github-actions github-actions bot added documentation Improvements or additions to documentation size-S Step Feature labels Oct 31, 2024
),
f.lit(0.0),
lambda acc, x: acc
+ x["score"]/f.pow(x["pos"], 2)/f.lit(sum(1 / ((i + 1)**2) for i in range(100)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only first 100 are used?
And It should be a devision by 1.644 somewhere...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part represents the division by ~1.644
I initially used first 1000 (1.6439..) but changed it to 100 (1.6349..)
Should I just change it to f.lit(1.644) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, misread it. All is fine. But I would use 1000 (to be consistent with the platform if documentation is correct)

@vivienho vivienho linked an issue Oct 31, 2024 that may be closed by this pull request
@github-actions github-actions bot added size-M and removed size-S labels Oct 31, 2024
@vivienho vivienho marked this pull request as ready for review October 31, 2024 19:28
Copy link
Contributor

@DSuveges DSuveges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a new step to gentropy to generate disease/target association (direct and indirect) based on l2g evidence. This also requires the inclusion of a new method to compute harmonic sum of values.

@DSuveges DSuveges merged commit b812f67 into dev Nov 1, 2024
5 checks passed
@DSuveges DSuveges deleted the vh-l2g-associations branch November 1, 2024 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation Feature size-M Step
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add genetics ETL step to generate association data
3 participants