Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature selected files available, with more row annotations included #48

Merged
merged 11 commits into from
Mar 21, 2021
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
*.gz filter=lfs diff=lfs merge=lfs -text
*.gct filter=lfs diff=lfs merge=lfs -text
shntnu marked this conversation as resolved.
Show resolved Hide resolved
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

The repository stores data and data processing scripts for **a subset** of the [Broad Drug Repurposing Hub](https://clue.io/repurposing#home) collection of compounds.

In this project, the [Connectivity Map](https://clue.io/team) team perturbed A549 cells with ~1,500 compounds across 6 doses in 5 technical replicates.
In this project, the [Connectivity Map](https://clue.io/team) team perturbed A549 cells with 1,571 compounds across 6 doses in 5 technical replicates.
We refer to this dataset as `LINCS Pilot 1`.

For a specific list of compounds tested, see [`metadata`](https://github.com/broadinstitute/lincs-cell-painting/tree/master/metadata).
Information about the compounds can be interactively explored in the [CLUE Repurposing app](https://clue.io/repurposing-app).
For a specific list of compounds tested, see [`metadata`](https://github.com/broadinstitute/lincs-cell-painting/tree/master/metadata).
Information about the compounds can be interactively explored in the [CLUE Repurposing app](https://clue.io/repurposing-app).
The [Morphology Connectivity Hub](https://clue.io/morphology) is the primary source of this dataset.

## Image-Based Profiling
Expand All @@ -23,6 +23,10 @@ For more details about image-based profiling in general, please refer to [Caiced

We use [conda](https://docs.conda.io/en/latest/) to manage the computational environment.

To install conda see [instructions](https://docs.conda.io/en/latest/miniconda.html).

We recommend installing conda by downloading and executing the `.sh` file and accepting defaults.

After installing conda, execute the following to install and navigate to the environment:

```bash
Expand Down
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
18 changes: 18 additions & 0 deletions consensus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,21 @@ We then recode the dose points into ascending numerical levels and add a new met

Note we generated per-well DMSO consensus signatures and per compound-dose pair consensus signatures for compounds.
The per-well DMSO profiles can help to assess plate-associated batch effects.

## Reproduce Pipeline

The pipeline can be reproduced by executing the following:

```bash
# Make sure conda environment is activated
conda activate lincs

# Reproduce thepipeline for producing bulk signatures
shntnu marked this conversation as resolved.
Show resolved Hide resolved
ipython scripts/nbconverted/build-consensus-signatures.py
```

`scripts/nbconverted/*.py` were created from the Jupyter notebooks in this folder, like this:

```sh
jupyter nbconvert --to=script --FilesWriter.build_directory=scripts/nbconverted *.ipynb
```
234 changes: 197 additions & 37 deletions consensus/build-consensus-signatures.ipynb

Large diffs are not rendered by default.

124 changes: 109 additions & 15 deletions consensus/scripts/nbconverted/build-consensus-signatures.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,18 @@
# Here, we generate consensus signatures for the LINCS Drug Repurposing Hub Cell Painting subset.
# See the project [README.md](README.md) for more details.
#
# This notebook generates four files; one per plate normalization and consensus normalization strategy.
# This notebook generates eight files; one per plate normalization and consensus normalization strategy, with and without feature selection.
#
# | Plate Normalization | Consensus Normalization | Consensus Suffix |
# | :------------------: | :------------------------: | -----------------: |
# | DMSO | Median | `<BATCH>_consensus_median_dmso.csv.gz` |
# | DMSO | MODZ | `<BATCH>_consensus_modz_dmso.csv.gz` |
# | Whole Plate | Median | `<BATCH>_consensus_median.csv.gz` |
# | Whole Plate | MODZ | `<BATCH>_consensus_modz.csv.gz` |
# |Feature selection | Plate Normalization | Consensus Normalization | Consensus Suffix |
# |:---------------- | :------------------: | :------------------------: | -----------------: |
# | No | DMSO | Median | `<BATCH>_consensus_median_dmso.csv.gz` |
# | No | DMSO | MODZ | `<BATCH>_consensus_modz_dmso.csv.gz` |
# | No | Whole Plate | Median | `<BATCH>_consensus_median.csv.gz` |
# | No | Whole Plate | MODZ | `<BATCH>_consensus_modz.csv.gz` |
# | Yes | DMSO | Median | `<BATCH>_consensus_median_feature_select_dmso.csv.gz` |
# | Yes | DMSO | MODZ | `<BATCH>_consensus_modz_feature_select_dmso.csv.gz` |
# | Yes | Whole Plate | Median | `<BATCH>_consensus_median_feature_select.csv.gz` |
# | Yes | Whole Plate | MODZ | `<BATCH>_consensus_modz_feature_select.csv.gz` |

# In[1]:

Expand All @@ -31,6 +35,7 @@

from pycytominer.aggregate import aggregate
from pycytominer.consensus import modz_base
from pycytominer.feature_select import feature_select

from pycytominer.cyto_utils import infer_cp_features

Expand Down Expand Up @@ -141,9 +146,9 @@ def consensus_apply(df, operation, cp_features, replicate_cols):
del all_profiles_df


# ## Create Consensus Profiles
# ## Create Consensus Profiles, with and without feature selection
#
# We generate two different consensus profiles for each of the normalization strategies. This generates four different files.
# We generate two different consensus profiles for each of the normalization strategies, with and without feature selection. This generates eight different files.

# In[7]:

Expand All @@ -155,12 +160,22 @@ def consensus_apply(df, operation, cp_features, replicate_cols):
"Metadata_pert_well",
"Metadata_mmoles_per_liter",
"Metadata_dose_recode",
"Metadata_moa",
"Metadata_target",
]


# In[8]:


# feature selection operations
feature_select_ops = [
"drop_na_columns",
"variance_threshold",
"correlation_threshold",
"blacklist",
]

all_consensus_dfs = {}
for norm_strat in file_bases:
all_profiles_df = all_profiles_dfs[norm_strat]
Expand All @@ -170,7 +185,9 @@ def consensus_apply(df, operation, cp_features, replicate_cols):
for operation in operations:
print(f"Now calculating {operation} consensus for {norm_strat} normalization")

consensus_profiles[operation] = consensus_apply(
consensus_profiles[operation] = {}

consensus_profiles[operation]["no_feat_select"] = consensus_apply(
all_profiles_df,
operation=operation,
cp_features=cp_norm_features,
Expand All @@ -179,31 +196,108 @@ def consensus_apply(df, operation, cp_features, replicate_cols):

# How many DMSO profiles per well?
print(
f"There are {consensus_profiles[operation].shape[0]} {operation} consensus profiles for {norm_strat} normalization"
f"There are {consensus_profiles[operation]['no_feat_select'].shape[0]} {operation} consensus profiles for {norm_strat} normalization"
)

# feature selection
print(
f"Now feature selecting on {operation} consensus for {norm_strat} normalization"
)

consensus_profiles[operation]["feat_select"] = feature_select(
profiles=consensus_profiles[operation]["no_feat_select"],
features="infer",
operation=feature_select_ops,
)

# How many features in feature selected profile?
print(
f"There are {consensus_profiles[operation]['feat_select'].shape[1]} features in {operation} consensus profiles for {norm_strat} normalization"
)

all_consensus_dfs[norm_strat] = consensus_profiles


# ## Merge and Output Consensus Signatures
# ## Merge and Output Consensus Signatures, with and without feature selection

# In[9]:


float_format = "%5g"
compression = "gzip"

for norm_strat in file_bases:
file_suffix = file_bases[norm_strat]["output_file_suffix"]
for operation in operations:

# No feature selection
consensus_file = f"{batch}_consensus_{operation}{file_suffix}"
consensus_file = pathlib.Path(batch, consensus_file)

consensus_df = all_consensus_dfs[norm_strat][operation]
consensus_df = all_consensus_dfs[norm_strat][operation]["no_feat_select"]

print(
f"Now Writing: Consensus Operation: {operation}; Norm Strategy: {norm_strat}\nFile: {consensus_file}"
f"Now Writing: Feature selection: No; Consensus Operation: {operation}; Norm Strategy: {norm_strat}\nFile: {consensus_file}"
)
print(consensus_df.shape)

consensus_df.to_csv(
consensus_file, sep=",", compression="gzip", float_format="%5g", index=False
consensus_file,
sep=",",
compression=compression,
float_format=float_format,
index=False,
)

# With feature selection
consensus_feat_df = all_consensus_dfs[norm_strat][operation]["feat_select"]

consensus_feat_file = (
f"{batch}_consensus_{operation}_feature_select{file_suffix}"
)
consensus_feat_file = pathlib.Path(batch, consensus_feat_file)

print(
f"Now Writing: Feature selection: Yes; Consensus Operation: {operation}; Norm Strategy: {norm_strat}\nFile: {consensus_feat_file}"
)
print(consensus_feat_df.shape)

consensus_feat_df.to_csv(
consensus_feat_file,
sep=",",
compression=compression,
float_format=float_format,
index=False,
)


# ## Save whole plate MODZ consensus signature as GCT
#
# Whole-plate-normalized + MODZ aggregated consensus profiles will be made available on clue.io/morphology as a GCT file.

# In[11]:


import pycytominer.write_gct
shntnu marked this conversation as resolved.
Show resolved Hide resolved

operation = "modz"
norm_strat = "whole_plate"
file_suffix = ".gct"
consensus_file = f"{batch}_consensus_{operation}{file_suffix}"
consensus_file = pathlib.Path(batch, consensus_file)

consensus_df = all_consensus_dfs[norm_strat][operation]["no_feat_select"]

print(
f"Now Writing: Consensus Operation: {operation}; Norm Strategy: {norm_strat}\nFile: {consensus_file}"
)
print(consensus_df.shape)

pycytominer.write_gct(consensus_df, consensus_file)
shntnu marked this conversation as resolved.
Show resolved Hide resolved


# In[ ]:




1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ name: lincs
channels:
- conda-forge
dependencies:
- pip=21.0.1
- conda-forge::pandas=1.0.1
- conda-forge::tabulate=0.8.7
- conda-forge::jupyter=1.0.0
Expand Down