Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding functionality to aggregate and annotate DeepProfiler output #78

Merged
merged 36 commits into from
Jun 4, 2021

Conversation

gwaybio
Copy link
Member

@gwaybio gwaybio commented May 8, 2020

Description

In this pull request, we complete the integration of DeepProfiler output and pycytominer processing.

We discuss the implementation in broadinstitute/DeepProfilerExperiments#2.

What is the nature of your change?

  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • This change requires a documentation update.

Checklist

Please ensure that all boxes are checked before indicating that a pull request is ready for review.

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

@gwaybio gwaybio marked this pull request as draft May 8, 2020 19:19
@gwaybio
Copy link
Member Author

gwaybio commented Sep 25, 2020

I added the real data tests, but the tests will fail until I add the data. The data will be added once broadinstitute/DeepProfilerExperiments#2 (comment) is resolved.

@gwaybio
Copy link
Member Author

gwaybio commented May 14, 2021

@michaelbornholdt - one thing that you can add to this PR are numpy style docstrings (which are compatible with Sphinx)

Example:

"""Combine population dataframe variables by strata groups using given operation.
Parameters
----------
population_df : pandas.core.frame.DataFrame
DataFrame to group and aggregate.
strata : list of str, default ["Metadata_Plate", "Metadata_Well"]
Columns to groupby and aggregate.
features : list of str, default "all"
List of features that should be aggregated.
operation : str, default "median"
How the data is aggregated. Currently only supports one of ['mean', 'median'].
output_file : str or file handle, optional
If provided, will write aggregated profiles to file. If not specified, will return the aggregated profiles.
We recommend naming the file based on the plate name.
compute_object_count : bool, default False
Whether or not to compute object counts.
object_feature : str, default "ObjectNumber"
Object number feature. Only used if compute_object_count=True.
subset_data_df : pandas.core.frame.DataFrame
How to subset the input.
compression_options : str, optional
The mechanism to compress.
float_format : str, optional
Decimal precision to use in writing output file.
Returns
-------
pandas.core.frame.DataFrame
DataFrame of aggregated features.
"""

If you're looking deeply at this code, I suspect that you'll need to figure this out on a per-argument basis anyway. Thanks!

@gwaybio
Copy link
Member Author

gwaybio commented May 14, 2021

(once you accept the write access invitation, you'll be able to commit directly to this branch)

@gwaybio gwaybio marked this pull request as ready for review May 14, 2021 20:45
Copy link
Member

@niranjchandrasekaran niranjchandrasekaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gwaygenomics I have made some comments (mostly about docstrings) and have a couple of suggestions. Other than that everything looks good.

I also noticed that the PR template was missing. I realize that this PR is old but can it still be added?

pycytominer/cyto_utils/load.py Outdated Show resolved Hide resolved
pycytominer/cyto_utils/load.py Outdated Show resolved Hide resolved
pycytominer/cyto_utils/DeepProfiler_processing.py Outdated Show resolved Hide resolved
pycytominer/cyto_utils/DeepProfiler_processing.py Outdated Show resolved Hide resolved
pycytominer/cyto_utils/DeepProfiler_processing.py Outdated Show resolved Hide resolved
pycytominer/cyto_utils/DeepProfiler_processing.py Outdated Show resolved Hide resolved
pycytominer/cyto_utils/DeepProfiler_processing.py Outdated Show resolved Hide resolved
@gwaybio
Copy link
Member Author

gwaybio commented May 19, 2021

Awesome @niranjchandrasekaran - this is great.

@michaelbornholdt is still actively working on this PR, but you may have just saved him lots of time. In general, I love the idea of having you review this PR, since it technically will be a joint contribution between Michael and I.

Michael, what do you think? I propose that we:

  1. Incorporate Niranj's suggestions
  2. Complete the work as outlined in pycytominer integration broadinstitute/DeepProfilerExperiments#2 (comment)
  3. Re-request Niranj's review
  4. Merge once we're all satisfied

@gwaybio
Copy link
Member Author

gwaybio commented May 19, 2021

ooh, I see what happened now, I think once I marked the pull request as ready for review, the CODEOWNERS file triggered a review request:

gwaygenomics requested a review from niranjchandrasekaran as a code owner 5 days ago

In the future and to save Niranj time, I will not be so quick to mark a PR as ready-for-review

@gwaybio
Copy link
Member Author

gwaybio commented May 19, 2021

I also noticed that the PR template was missing. I realize that this PR is old but can it still be added?

Yeah, it's an old PR - but I agree adding it would be helpful. @michaelbornholdt, when you are happy with your fixes, please go through the checklist above.

@michaelbornholdt
Copy link
Contributor

@gwaygenomics
Done. Don't know what that check is and why its failing.

I also make sure to reset_index(drop=True) since we are merging metadata by index and this is easy to mixup
@codecov-commenter
Copy link

codecov-commenter commented May 21, 2021

Codecov Report

Merging #78 (d2db947) into master (fdd0e09) will decrease coverage by 0.10%.
The diff coverage is 97.43%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #78      +/-   ##
==========================================
- Coverage   98.64%   98.54%   -0.11%     
==========================================
  Files          46       48       +2     
  Lines        1998     2197     +199     
==========================================
+ Hits         1971     2165     +194     
- Misses         27       32       +5     
Impacted Files Coverage Δ
pycytominer/cyto_utils/load.py 88.00% <90.00%> (+1.33%) ⬆️
pycytominer/cyto_utils/DeepProfiler_processing.py 96.84% <96.84%> (ø)
pycytominer/cyto_utils/__init__.py 100.00% <100.00%> (ø)
...ts/test_cyto_utils/test_DeepProfiler_processing.py 100.00% <100.00%> (ø)
pycytominer/tests/test_cyto_utils/test_load.py 100.00% <100.00%> (ø)
pycytominer/tests/test_aggregate.py 100.00% <0.00%> (ø)
pycytominer/tests/test_cyto_utils/test_cells.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdd0e09...d2db947. Read the comment docs.

@gwaybio
Copy link
Member Author

gwaybio commented May 24, 2021

@niranjchandrasekaran - this is back to you. Thanks!

Copy link
Member

@niranjchandrasekaran niranjchandrasekaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gwaygenomics @michaelbornholdt LGTM! You may want to update the docstrings for the old functions in pycytominer/cyto_utils/load.py while you are at it. But it can also wait. Nevertheless I have approved the PR.

@gwaybio
Copy link
Member Author

gwaybio commented Jun 4, 2021

awesome, thanks @niranjchandrasekaran !

You may want to update the docstrings for the old functions in pycytominer/cyto_utils/load.py

I definitely think we should update docstrings as we go - but I think we did update these functions? Maybe I'm missing something or you mean a different file?

Either way, that potential documentation update is relatively minor - which is great!

@michaelbornholdt is this ready to merge? If so, I will go ahead.

@niranjchandrasekaran
Copy link
Member

I definitely think we should update docstrings as we go - but I think we did update these functions? Maybe I'm missing something or you mean a different file?

Ah, it was just a single function.

def infer_delim(file):
"""
Sniff the delimiter in the given file
Arguments
---------
file - a string indicating file name

@michaelbornholdt
Copy link
Contributor

@michaelbornholdt is this ready to merge? If so, I will go ahead.

All good! Go ahead.
I already have the next PR set up in my head 🤣

@gwaybio
Copy link
Member Author

gwaybio commented Jun 4, 2021

Ah, it was just a single function.

ooo, good catch. Thanks. I will update.

All good! Go ahead.
I already have the next PR set up in my head 🤣

Wonderful. Can't wait! 😈

@gwaybio gwaybio merged commit 4f37e01 into cytomining:master Jun 4, 2021
@gwaybio gwaybio deleted the add-deepprofiler-processing branch June 4, 2021 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants