-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add the step class for fine-mapping #554
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only have a few small comments that doesn't affect the logic, so not a deal breaker for merging. My comments are stylisting, if you want to address them, you can before merge.
cred_set.tolist(), | ||
["variantId", "posteriorProbability", "logBF", "beta"], | ||
) | ||
.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this join the mode is inner
by default. Is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be 1 to 1 the same size and order
src/gentropy/susie_finemapper.py
Outdated
session (Session): Spark session | ||
_studyId (str): study ID | ||
_region (str): region | ||
_join (DataFrame): DataFrame with variant information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename the _join
variable to something more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/gentropy/susie_finemapper.py
Outdated
order_creds.sort(key=lambda x: x[1], reverse=True) | ||
cred_sets = None | ||
counter = 0 | ||
for i, value in order_creds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i
is fine, this is a scanonical way of calling an index variable, however value
is not very telling. At row 110 it is very complicated what does this value
refer to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed it for cs_lbf_value
susie_output (dict[str, Any]): SuSiE-inf output dictionary | ||
session (Session): Spark session | ||
_studyId (str): study ID | ||
_region (str): region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a consisted way of representing regions? If so, in the args description could be written eg.:
_region (str): finemapped region in chr:start-end format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we don't have it now. But agree, we need to think about the standard.
src/gentropy/susie_finemapper.py
Outdated
if cred_sets is None: | ||
cred_sets = cred_set | ||
else: | ||
cred_sets = cred_sets.union(cred_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use unionByName
rather union
, because union concatenates columns by positions instead of names. I think in this case it is fine, as the order of columns are defined in rwo 120, but still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
_join=gwas_df, | ||
cs_lbf_thr=2, | ||
) | ||
assert isinstance(L1, StudyLocus), "L1 is not an instance of StudyLocus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how does the test dataset look like, it would be great to assert that the number of credible set is what you are expecting, and validate if the locus
object is healthy. However I understand if that is not a high priority for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can create more meaningful test for bigger function that will use this convertor on later stages.
β¨ Context
This PR introduced the fine-mapper step and has a function that converts susie output to study-locus.
π What does this PR implement
See above.
π Missing
Documentation will be updated later.
π¦ Before submitting
dev
branch?make test
)?poetry run pre-commit run --all-files
)?