Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature selected files available, with more row annotations included #48

Merged
merged 11 commits into from
Mar 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,18 @@
The Library of Integrated Network-Based Cellular Signatures (LINCS) Project aims to create publicly available resources to characterize how cells respond to perturbation.
This repository stores Cell Painting readouts and associated data-processing pipelines for the LINCS Cell Painting dataset.

In this project, the [Connectivity Map](https://clue.io/team) team perturbed A549 cells with 1,571 compounds across 6 doses in 5 technical replicates.
The data represent **a subset** of the [Broad Drug Repurposing Hub](https://clue.io/repurposing#home) collection of compounds.

In this project, the [Connectivity Map](https://clue.io/team) team perturbed A549 cells with ~1,500 compounds across 6 doses in 5 technical replicates.
We refer to this dataset as `LINCS Pilot 1`.
We also include data for the second batch of LINCS Cell Painting data, which we refer to as `LKCP`.

For a specific list of compounds tested, see [`metadata`](https://github.com/broadinstitute/lincs-cell-painting/tree/master/metadata).
You can interactively explore information about the compounds in the [CLUE Repurposing app](https://clue.io/repurposing-app).

The [Morphology Connectivity Hub](https://clue.io/morphology) is the primary source of this dataset.

## Image-Based profiling
## Image-based profiling

We apply a unified, image-based profiling pipeline to all 136 384-well plates from `LINCS Pilot 1`, and all 135 384-well plates from `LKCP`.
We use [pycytominer](https://github.com/cytomining/pycytominer) as the primary tool for image-based profiling.
Expand All @@ -27,6 +28,10 @@ For more details about image-based profiling in general, please refer to [Caiced

We use [conda](https://docs.conda.io/en/latest/) to manage the computational environment.

To install conda see [instructions](https://docs.conda.io/en/latest/miniconda.html).

We recommend installing conda by downloading and executing the `.sh` file and accepting defaults.

After installing conda, execute the following to install and navigate to the environment:

```bash
Expand Down
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
18 changes: 18 additions & 0 deletions consensus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,21 @@ We then recode the dose points into ascending numerical levels and add a new met

Note we generated per-well DMSO consensus signatures and per compound-dose pair consensus signatures for compounds.
The per-well DMSO profiles can help to assess plate-associated batch effects.

## Reproduce Pipeline

The pipeline can be reproduced by executing the following:

```bash
# Make sure conda environment is activated
conda activate lincs

# Reproduce the pipeline for producing bulk signatures
ipython scripts/nbconverted/build-consensus-signatures.py
```

`scripts/nbconverted/*.py` were created from the Jupyter notebooks in this folder, like this:

```sh
jupyter nbconvert --to=script --FilesWriter.build_directory=scripts/nbconverted *.ipynb
```
158 changes: 120 additions & 38 deletions consensus/build-consensus-signatures.ipynb

Large diffs are not rendered by default.

94 changes: 78 additions & 16 deletions consensus/scripts/nbconverted/build-consensus-signatures.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,18 @@
# Here, we generate consensus signatures for the LINCS Drug Repurposing Hub Cell Painting subset.
# See the project [README.md](README.md) for more details.
#
# This notebook generates four files; one per plate normalization and consensus normalization strategy.
# This notebook generates eight files; one per plate normalization and consensus normalization strategy, with and without feature selection.
#
# | Plate Normalization | Consensus Normalization | Consensus Suffix |
# | :------------------: | :------------------------: | -----------------: |
# | DMSO | Median | `<BATCH>_consensus_median_dmso.csv.gz` |
# | DMSO | MODZ | `<BATCH>_consensus_modz_dmso.csv.gz` |
# | Whole Plate | Median | `<BATCH>_consensus_median.csv.gz` |
# | Whole Plate | MODZ | `<BATCH>_consensus_modz.csv.gz` |
# |Feature selection | Plate Normalization | Consensus Normalization | Consensus Suffix |
# |:---------------- | :------------------: | :------------------------: | -----------------: |
# | No | DMSO | Median | `<BATCH>_consensus_median_dmso.csv.gz` |
# | No | DMSO | MODZ | `<BATCH>_consensus_modz_dmso.csv.gz` |
# | No | Whole Plate | Median | `<BATCH>_consensus_median.csv.gz` |
# | No | Whole Plate | MODZ | `<BATCH>_consensus_modz.csv.gz` |
# | Yes | DMSO | Median | `<BATCH>_consensus_median_feature_select_dmso.csv.gz` |
# | Yes | DMSO | MODZ | `<BATCH>_consensus_modz_feature_select_dmso.csv.gz` |
# | Yes | Whole Plate | Median | `<BATCH>_consensus_median_feature_select.csv.gz` |
# | Yes | Whole Plate | MODZ | `<BATCH>_consensus_modz_feature_select.csv.gz` |

# In[1]:

Expand All @@ -31,7 +35,7 @@

from pycytominer.aggregate import aggregate
from pycytominer.consensus import modz_base

from pycytominer.feature_select import feature_select
from pycytominer.cyto_utils import infer_cp_features


Expand Down Expand Up @@ -141,9 +145,9 @@ def consensus_apply(df, operation, cp_features, replicate_cols):
del all_profiles_df


# ## Create Consensus Profiles
# ## Create Consensus Profiles, with and without feature selection
#
# We generate two different consensus profiles for each of the normalization strategies. This generates four different files.
# We generate two different consensus profiles for each of the normalization strategies, with and without feature selection. This generates eight different files.

# In[7]:

Expand All @@ -155,12 +159,22 @@ def consensus_apply(df, operation, cp_features, replicate_cols):
"Metadata_pert_well",
"Metadata_mmoles_per_liter",
"Metadata_dose_recode",
"Metadata_moa",
"Metadata_target",
]


# In[8]:


# feature selection operations
feature_select_ops = [
"drop_na_columns",
"variance_threshold",
"correlation_threshold",
"blacklist",
]

all_consensus_dfs = {}
for norm_strat in file_bases:
all_profiles_df = all_profiles_dfs[norm_strat]
Expand All @@ -170,7 +184,9 @@ def consensus_apply(df, operation, cp_features, replicate_cols):
for operation in operations:
print(f"Now calculating {operation} consensus for {norm_strat} normalization")

consensus_profiles[operation] = consensus_apply(
consensus_profiles[operation] = {}

consensus_profiles[operation]["no_feat_select"] = consensus_apply(
all_profiles_df,
operation=operation,
cp_features=cp_norm_features,
Expand All @@ -179,31 +195,77 @@ def consensus_apply(df, operation, cp_features, replicate_cols):

# How many DMSO profiles per well?
print(
f"There are {consensus_profiles[operation].shape[0]} {operation} consensus profiles for {norm_strat} normalization"
f"There are {consensus_profiles[operation]['no_feat_select'].shape[0]} {operation} consensus profiles for {norm_strat} normalization"
)

# feature selection
print(
f"Now feature selecting on {operation} consensus for {norm_strat} normalization"
)

consensus_profiles[operation]["feat_select"] = feature_select(
profiles=consensus_profiles[operation]["no_feat_select"],
features="infer",
operation=feature_select_ops,
)

# How many features in feature selected profile?
print(
f"There are {consensus_profiles[operation]['feat_select'].shape[1]} features in {operation} consensus profiles for {norm_strat} normalization"
)

all_consensus_dfs[norm_strat] = consensus_profiles


# ## Merge and Output Consensus Signatures
# ## Merge and Output Consensus Signatures, with and without feature selection

# In[9]:


float_format = "%5g"
compression = "gzip"

for norm_strat in file_bases:
file_suffix = file_bases[norm_strat]["output_file_suffix"]
for operation in operations:

# No feature selection
consensus_file = f"{batch}_consensus_{operation}{file_suffix}"
consensus_file = pathlib.Path(batch, consensus_file)

consensus_df = all_consensus_dfs[norm_strat][operation]
consensus_df = all_consensus_dfs[norm_strat][operation]["no_feat_select"]

print(
f"Now Writing: Consensus Operation: {operation}; Norm Strategy: {norm_strat}\nFile: {consensus_file}"
f"Now Writing: Feature selection: No; Consensus Operation: {operation}; Norm Strategy: {norm_strat}\nFile: {consensus_file}"
)
print(consensus_df.shape)

consensus_df.to_csv(
consensus_file, sep=",", compression="gzip", float_format="%5g", index=False
consensus_file,
sep=",",
compression=compression,
float_format=float_format,
index=False,
)

# With feature selection
consensus_feat_df = all_consensus_dfs[norm_strat][operation]["feat_select"]

consensus_feat_file = (
f"{batch}_consensus_{operation}_feature_select{file_suffix}"
)
consensus_feat_file = pathlib.Path(batch, consensus_feat_file)

print(
f"Now Writing: Feature selection: Yes; Consensus Operation: {operation}; Norm Strategy: {norm_strat}\nFile: {consensus_feat_file}"
)
print(consensus_feat_df.shape)

consensus_feat_df.to_csv(
consensus_feat_file,
sep=",",
compression=compression,
float_format=float_format,
index=False,
)

1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ name: lincs
channels:
- conda-forge
dependencies:
- pip=21.0.1
- conda-forge::pandas=1.0.1
- conda-forge::tabulate=0.8.7
- conda-forge::jupyter=1.0.0
Expand Down