Decoupling rule modules into individual components #33

axiomcura · 2023-03-27T21:56:35Z

About this PR

This PR focuses on separating and rules into individual components. In the current version, there are some rules modules that contains multiple rules that conduct different process:

preocessing.smk

rule aggregate:
    input:
        sql_files=PLATE_DATA,
        barcodes=BARCODES,
        metadata=METADATA_DIR,
    output:
        aggregate_profile=AGGREGATE_DATA,
        cell_counts=CELL_COUNTS,
    log:
        "logs/aggregate_{file_name}.log",
    conda:
        "../envs/cytominer_env.yaml"
    params:
        aggregate_config=config["config_paths"]["single_cell"],
    script:
        "../scripts/aggregate_cells.py"

rule annotate:
    input:
        aggregate_profile=AGGREGATE_DATA,
        barcodes=BARCODES,
        metadata=METADATA_DIR,
    output:
        ANNOTATED_DATA,
    conda:
        "../envs/cytominer_env.yaml"
    log:
        "logs/annotate_{file_name}.log",
    params:
        annotate_config=config["config_paths"]["annotate"],
    script:
        "../scripts/annotate.py"

rule normalize:
    input:
        ANNOTATED_DATA,
    output:
        NORMALIZED_DATA,
    conda:
        "../envs/cytominer_env.yaml"
    log:
        "logs/normalized_{file_name}.log",
    params:
        normalize_config=config["config_paths"]["normalize"],
    script:
        "../scripts/normalize.py"

This is the preprocess.smk module that is currently implemented in Cyotsnake. Currently, the rules aggregate, annoate, and normalize are strongly linked because each rule expects outputs from the previous rules.

Creating a strong cohesion between rules will require developers to repeat the same code in their modules.

For example, since the normalize rule is deeply coupled within preprocess.smk, then the user will have to create another rule module that will contain the normalization process.

In addition, this makes rule modules non-extensible to major workflows. If you’re designing a major workflow and you require the normalization , then it will require the whole preprocess.smk to be imported to your workflow, which is not ideal, hence decoupling is a great solution to this problem.

Separating modules into individual components

Separating each rule into it’s own independent modules has it’s advantages. It will remove repeated code and increase extensibility.

Therefore it will look like this (decoupling preprocess.smk:

aggregate.smk

rule aggregate:
    input:
        sql_files=PLATE_DATA,
        barcodes=BARCODES,
        metadata=METADATA_DIR,
    output:
        aggregate_profile=AGGREGATE_DATA,
        cell_counts=CELL_COUNTS,
    log:
        "logs/aggregate_{file_name}.log",
    conda:
        "../envs/cytominer_env.yaml"
    params:
        aggregate_config=config["config_paths"]["single_cell"],
    script:
        "../scripts/aggregate_cells.py"

annotate.smk

rule annotate:
    input:
        aggregate_profile=AGGREGATE_DATA,
        barcodes=BARCODES,
        metadata=METADATA_DIR,
    output:
        ANNOTATED_DATA,
    conda:
        "../envs/cytominer_env.yaml"
    log:
        "logs/annotate_{file_name}.log",
    params:
        annotate_config=config["config_paths"]["annotate"],
    script:
        "../scripts/annotate.py"

normalize.smk

rule normalize:
    input:
        ANNOTATED_DATA,
    output:
        NORMALIZED_DATA,
    conda:
        "../envs/cytominer_env.yaml"
    log:
        "logs/normalized_{file_name}.log",
    params:
        normalize_config=config["config_paths"]["normalize"],
    script:
        "../scripts/normalize.py"

Now each components is individually, developers can import these modules to their workflows without any problem!

Another additional feature this PR introduces is the ability to inherit modules. Here’s an example: Let’s say we’re creating a new module but we also want this new module to create a tight couple with the normalization method.

This can be easily solved by inheriting the normalizatio.smk into your new rule module.

new_rule.smk

# lets inheret the normalization module 
include: `./normalization.smk`

rule new_rule:
     input:
        ANNOTATED_DATA,
    output:
        NORMALIZED_DATA,
    script:
         "../scripts/new_script.py"

the include is similar to python’s import call, as it imports the namespace into the new_rule.smk

This is beneficial because users do not have to write a new module or repeat code within the new_rule.smk

gwaybio

really great, straightforward PR! 🎉

axiomcura · 2023-03-27T22:10:19Z

Great, merging!

axiomcura added 8 commits March 14, 2023 13:44

fixed minor pathing bugs

74f9959

separated cp_process module

a316cd8

update logs documentation

193cbd7

added documentation

df612c9

edit typos

5b6644f

update cp_process workflow

dc7a9d4

update pycytominer version

2490d7d

file typo fixed

48802be

axiomcura requested a review from gwaybio March 27, 2023 21:57

gwaybio approved these changes Mar 27, 2023

View reviewed changes

axiomcura merged commit 9612414 into WayScience:main Mar 27, 2023

axiomcura deleted the module-sep branch March 28, 2023 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoupling rule modules into individual components #33

Decoupling rule modules into individual components #33

axiomcura commented Mar 27, 2023

gwaybio left a comment

axiomcura commented Mar 27, 2023

Decoupling rule modules into individual components #33

Decoupling rule modules into individual components #33

Conversation

axiomcura commented Mar 27, 2023

About this PR

Separating modules into individual components

gwaybio left a comment

Choose a reason for hiding this comment

axiomcura commented Mar 27, 2023