Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add barcode logic to CytoSnake's CLI #46

Merged
merged 21 commits into from
May 9, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ repos:
rev: v1.1.0
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
hooks:
- id: sourcery
args: [--diff=git diff HEAD, --no-summary]
args: [--diff=git diff HEAD, --fix, --no-summary]

# snakemake formatting
- repo: https://github.com/snakemake/snakefmt
Expand Down
115 changes: 115 additions & 0 deletions configs/wf_configs/cp_process.yaml
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
name: cp_process

# Documentation
docs: |
Description:
------------
Traditional workflow

Workflow Steps:
---------------
Below the workflow steps are separated in chunks.

aggregate_configs:
aggregates single-cell morphology at a given level of aggregation. For example
one can aggregate single cell at the well level. This means that all cells within
a well will be aggregated into a single data point containing all morphology features.

annotate_configs:
Adds meta data to the given dataset. This includes information regarding to well
position, types of perturbations, etc.

normalize_configs:
Applies normalization to given dataset

feature_select_configs:
Select features from given dataset

consensus_configs:
Creates a consensus profile. Consensus profiles are unique signatures that are mapped
to a given perturbations.

annotate_configs:
params:
join_on:
- Metadata_well_position
- Image_Metadata_Well
add_metadata_id_to_platemap: True
format_broad_cmap: False
clean_cellprofiler: True
external_metadata: "none"
external_join_left: "none"
external_join_right: "none"
compression_options:
method: "gzip"
mtime: 1
Comment on lines +43 to +45
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing how patterns in the configs occur (like compression_options), consider making use of global variable references (if possible) within these files. These could be referenced within the individual fields to ensure consistency and reduce maintenance in the instances where change is required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. It seems like you can assign variables within the configurations and allow re-using: (learned something new today)

Here's an example below:

# test.yaml file
compression_options: &DEFAULT_COMPRESSION
  method: "gzip"
  mtime: 1

first_value: *DEFAULT_COMPRESSION

here's the code that is used to read the test yaml file:

import yaml
with open("./test.yaml", mode="r") as f:
    data = yaml.safe_load(f)
print(data)

And here's the output:

{'compression_options': {'method': 'gzip', 'mtime': 1},
 'first_value': {'method': 'gzip', 'mtime': 1}}

So it seems like you can but, however, I could see some issues with this in terms of readability. Maybe I am naïve, but I have not seen a config files that uses global variables within them. This begs the question if using config variables is a common trend? For example, do we expect the majority of users to know what these variables are and how it is used within the config file? (I do not know the right answer to this, maybe you, @MattsonCam and/or @gwaybio? Might know this)

I can see this working perfectly in the private configs that exist within the .cytosnake directory for development purpose, but I am quite "iffy" about how users will react to variables in config files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great findings! One way you could consider addressing understandability here would be with comments (for ex: # this line creates a global config variable used later as *VARIABLE_NAME). I understand reasons why you might consider not using this method and defer to you on what's best.

float_format: null
cmap_args: {}

aggregate_configs:
params:
strata:
- Metadata_Plate
- Metadata_Well
features: infer
operation: median
output_file: none
compute_object_count: False
object_feature: Metadata_ObjectNumber
subset_data_df: none
compression_options:
method: gzip
mtime: 1
float_format: null

normalize_configs:
params:
features: infer
image_features: False
meta_features: infer
samples: all
method: mad_robustize
compression_options:
method: gzip
mtime: 1
float_format: null
mad_robustize_epsilon: 1.0e-18
spherize_center: True
spherize_method: ZCA-cor
spherize_epsilon: 1.0e-6

feature_select_configs:
params:
features: infer
image_features: False
samples: all
operation:
- variance_threshold
- drop_na_columns
- correlation_threshold
- drop_outliers
- blocklist
na_cutoff: 0.05
corr_threshold: 0.9
corr_method: pearson
freq_cut: 0.05
unique_cut: 0.1
compression_options:
method: gzip
mtime: 1
float_format: null
blocklist_file: null
outlier_cutoff: 15
noise_removal_perturb_groups: null
noise_removal_stdev_cutoff: null

consensus_config:
params:
replicate_columns:
- Metadata_cell_line
- Metadata_pert_name
operation: median
features: infer
compression_options: null
float_format: null
modz_args: { "method": "spearman" }
7 changes: 6 additions & 1 deletion cytosnake/cli/cmd.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,14 @@
import sys
from pathlib import Path

# cytosnake imports
from cytosnake.cli.args import CliControlPanel
from cytosnake.cli.cli_docs import cli_docs, init_doc, run_doc
from cytosnake.cli.exec.workflow_exec import workflow_executor
from cytosnake.cli.setup_init import init_cp_data, init_dp_data
from cytosnake.common.errors import ProjectExistsError, WorkflowFailedException

# cytosnake imports
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
from cytosnake.guards.input_guards import check_init_parameter_inputs
from cytosnake.utils import cyto_paths
from cytosnake.utils.cytosnake_setup import setup_cytosnake_env

Expand Down Expand Up @@ -63,6 +65,9 @@ def run_cmd() -> None:
logging.info(msg="Formatting input files")
init_args = args_handler.parse_init_args()

# before setup up, check the logic of the input parameters
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
check_init_parameter_inputs(user_params=init_args)

# identifying which data type was added and how to set it up
match init_args.datatype:
case "cell_profiler":
Expand Down
4 changes: 4 additions & 0 deletions cytosnake/common/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,10 @@ class ExtensionError(BaseValueError):
"""Raised when invalid extensions are captured"""


class BarcodeRequiredError(BaseFileNotFound):
"""Raised when a barcode file is required"""


# -----------------------
# Error handling functions
# -----------------------
Expand Down
65 changes: 65 additions & 0 deletions cytosnake/guards/input_guards.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""
module: input_guards.py

This module will handle the CytoSnake's CLI logic mostly interacting with user defined
parameters from CytoSnake's CLI.

There the logic establishes some rules of what inputs are required or what functionality
is or not allowed.
"""
import pathlib
from typing import TypeVar

from cytosnake.common.errors import BarcodeRequiredError

# declaring user based type hinting
NameSpace = TypeVar("NameSpace")


def is_barcode_required(user_params: NameSpace) -> bool:
"""
user_params: NameSpace
Argparse.NameSpace object that contains all user provided parameters

Returns
-------
bool
With the given parameter inputs, True if barcodes are required else False
"""

# getting both barcode and metadata from cli inputs
barcode_param = user_params.barcode
metadata_path = pathlib.Path(user_params.metadata).resolve(strict=True)

# counting number of platemaps in metadata
plate_maps_path = metadata_path / "platemaps"
n_platemaps = len(list(plate_maps_path.glob("*")))

# if the metadata directory has more than 1 plate maps and no barcode file return
# True.
# This indicates that a barcode is required
return n_platemaps > 1 and barcode_param is None


def check_init_parameter_inputs(user_params: NameSpace) -> bool:
"""Main wrapper to check `init` mode parameter logic.

Parameters
----------
args : NameSpace
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
Argparse.NameSpace object that contains all user provided parameters.

Returns
-------
bool
True if all logic checks passed

Raises
------
BarcodeRequiredError
Raised if a multiple platemaps are found but not barcode file was provided
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
"""

# checking if barcode is required
if not is_barcode_required:
BarcodeRequiredError("Barcode is required, multiple platemaps found")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking: is this block checking whether the barcode file is required or whether it is missing? If it's checking for whether it's missing, consider using "missing" (or similar) in the variable and object names for clarity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is checking if the barcode is missing.

Are you suggesting something like this?:

if is_missing_barcode:
    BarcodeRequiredError("Barcode is missing")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done some changes with the naming, let me know if it works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification! In addition to the message, I'm wondering if the exception name itself also should reflect what is "exceptional" or "erroneous". For example, BarcodeMissingError or similar. (this might require a change to the exception itself).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. I'll apply those changes.

3 changes: 0 additions & 3 deletions workflows/rules/aggregate.smk
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,6 @@ Returns
"""


configfile: "configs/configuration.yaml"


rule aggregate:
input:
sql_files=PLATE_DATA,
Expand Down
3 changes: 0 additions & 3 deletions workflows/rules/annotate.smk
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,6 @@ Returns:
"""


configfile: "configs/configuration.yaml"


rule annotate:
input:
aggregate_profile=AGGREGATE_DATA,
Expand Down
3 changes: 0 additions & 3 deletions workflows/rules/cytotable_convert.smk
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,6 @@ Returns:
"""


configfile: "configs/configuration.yaml"


rule convert:
input:
PLATE_DATA,
Expand Down
3 changes: 0 additions & 3 deletions workflows/rules/feature_select.smk
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,6 @@ Returns
"""


configfile: "configs/configuration.yaml"


rule feature_select:
input:
NORMALIZED_DATA_EXPAND,
Expand Down
3 changes: 0 additions & 3 deletions workflows/rules/generate_consensus.smk
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,6 @@ Return:
"""


configfile: "configs/configuration.yaml"


rule create_consensus:
input:
SELECTED_FEATURE_DATA_EXPAND,
Expand Down
3 changes: 0 additions & 3 deletions workflows/rules/normalize.smk
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,6 @@ Output
"""


configfile: "configs/configuration.yaml"


rule normalize:
input:
CYTOTABLE_OUTPUT_DATA,
Expand Down
12 changes: 9 additions & 3 deletions workflows/workflow/cp_process.smk
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
import glob
from cytosnake.helpers import helper_funcs as hf
"""
# TODO:
[ Add docuemntation]
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
"""


# import workflow configurations
configfile: "./configs/wf_configs/cp_process.yaml"


# importing rule modules
Expand All @@ -11,7 +17,7 @@ include: "../rules/feature_select.smk"
include: "../rules/generate_consensus.smk"


# expected outputs from workflow
# set expected outputs from workflow
rule all:
input:
AGGREGATE_DATA_EXPAND,
Expand Down
3 changes: 2 additions & 1 deletion workflows/workflow/cp_process_singlecells.smk
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ Returns
"""


# importing workflow configs
# importing workflow configs [general + workflow config]
configfile: "./configs/configuration.yaml"
configfile: "./configs/wf_configs/cp_process_singlecells.yaml"


Expand Down