-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pycytominer integration #2
Comments
Adding a question closely related to cytomining/DeepProfiler#229 (comment)
For these files specifically (for example, the ones in DeepProfilerExperiments), is the only way to extract Plate, Well, and Site metadata is to parse the file name? Or is there a better way? |
We can easily recompute DeepProfiler features. We're happy to do so to test the new format. Computing features is not as expensive as training a model. Given that we are constantly training and evaluating features, the feature computation part is kind of routine and can be repeated any time. So I would suggest to ignore backwards compatibility issues or rescuing old feature files already computed. It's easier to delete these files and generate new ones with the best format that we agree to have 🙂 I missed on the feature file list before. I think this needs additional implementation in DeepProfiler. I will add this comment to our other discussion. |
Great! This will make the code in each DeepProfiler experiment notebook (for each dataset) much cleaner and more streamlined. |
We are recomputing features for Cell Painting datasets. I will make a note in this thread when features are available to start integrating pycytominer in the downstream analysis. |
@jccaicedo I set aside time today to push the DeepProfiler-pycytominer integration further along. Two questions:
Also, one quick note: I went through all of our existing discussion once more and it was fun |
@jccaicedo - The DeepProfiler and CellProfiler comparison analysis keeps popping into my head. I am wondering if there are any updates? I am writing in this thread b/c of the questions I had a couple months ago. I'd like to finalize the DeepProfiler integration in cytomining/pycytominer#78, and I think this is where I can contribute most to your project. |
Hey @jccaicedo, @michaelbornholdt and I just chatted about the remaining steps to add DeepProfiler integration into pycytominer. I think we're very close! Here is a summary of our current plan - please feel free to modify.
Once these things happen, then Michael will be more readily able to benchmark his DeepProfiler comparison experiments! |
@gwaygenomics You can add that file to your tests. The plan looks great! |
Awesome! I added the file in cytomining/pycytominer@d237b41 @michaelbornholdt - I just now realized that |
In cytomining/pycytominer#78 I am working towards integrating DeepProfiler processing into pycytominer. Currently and by default, DeepProfiler outputs
.npz
files storing numpy arrays of single cell profiles. In cytomining/DeepProfiler#229 we discuss a potential update to the.npz
file output to also include metadata information.There are a couple of decision points that we need to make to move the integration forward, which will be partially driven by the goals in the DeepProfilerExperiments repo. In cytomining/DeepProfiler#229 (comment) I bring up two different points of consideration: 1) How to use
index.csv
and 2) Feature prefix style.I think both of these decision points are relatively minor, and any pycytominer code will be flexible to handle multiple metadata options and enable a customizable feature prefix. The question about feature prefix is most directly related to what we think the default prefix should be (
DP
orDP_
are two options)Additional topics
I think that these topics are more pressing than the first two listed above: Will the profiles be updated for each dataset to include the metadata
.npz
format? Or, will we proceed without recalculating? If we proceed without recalculating (which I think is the likely scenario), we need to settle on pycytominer strategy.Strategy
I do not think that pycytominer should include code to parse plate, well, and site information from filenames. This is a very fragile way of storing these variables - I believe that they should come from an internal source or be stored in an external file that includes file path information pointing to files with corresponding metadata. The latter is also fragile (file names are mutable!), but not as fragile as the metadata-in-file name paradigm.
However, since we probably won't recompute profiles, we require a strategy to incorporate metadata from file names. Therefore, I propose that we take multiple pycytominer steps to integrate these metadata (instead of dealing with all of the processing internally in pycytominer).
The proposed workflow is as follows:
.npz
files in pycytominerload_npz()
outputI will proceed with this strategy for now, but please do suggest alternatives! We can always pivot strategies later on if this ends up being clunky or doesn't reduce code.
The text was updated successfully, but these errors were encountered: