Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoupling rule modules into individual components #33

Merged
merged 8 commits into from
Mar 27, 2023

Conversation

axiomcura
Copy link
Member

About this PR

This PR focuses on separating and rules into individual components. In the current version, there are some rules modules that contains multiple rules that conduct different process:

preocessing.smk

rule aggregate:
    input:
        sql_files=PLATE_DATA,
        barcodes=BARCODES,
        metadata=METADATA_DIR,
    output:
        aggregate_profile=AGGREGATE_DATA,
        cell_counts=CELL_COUNTS,
    log:
        "logs/aggregate_{file_name}.log",
    conda:
        "../envs/cytominer_env.yaml"
    params:
        aggregate_config=config["config_paths"]["single_cell"],
    script:
        "../scripts/aggregate_cells.py"

rule annotate:
    input:
        aggregate_profile=AGGREGATE_DATA,
        barcodes=BARCODES,
        metadata=METADATA_DIR,
    output:
        ANNOTATED_DATA,
    conda:
        "../envs/cytominer_env.yaml"
    log:
        "logs/annotate_{file_name}.log",
    params:
        annotate_config=config["config_paths"]["annotate"],
    script:
        "../scripts/annotate.py"

rule normalize:
    input:
        ANNOTATED_DATA,
    output:
        NORMALIZED_DATA,
    conda:
        "../envs/cytominer_env.yaml"
    log:
        "logs/normalized_{file_name}.log",
    params:
        normalize_config=config["config_paths"]["normalize"],
    script:
        "../scripts/normalize.py"

This is the preprocess.smk module that is currently implemented in Cyotsnake. Currently, the rules aggregate, annoate, and normalize are strongly linked because each rule expects outputs from the previous rules.

Creating a strong cohesion between rules will require developers to repeat the same code in their modules.

For example, since the normalize rule is deeply coupled within preprocess.smk, then the user will have to create another rule module that will contain the normalization process.

In addition, this makes rule modules non-extensible to major workflows. If you’re designing a major workflow and you require the normalization , then it will require the whole preprocess.smk to be imported to your workflow, which is not ideal, hence decoupling is a great solution to this problem.

Separating modules into individual components

Separating each rule into it’s own independent modules has it’s advantages. It will remove repeated code and increase extensibility.

Therefore it will look like this (decoupling preprocess.smk:

aggregate.smk

rule aggregate:
    input:
        sql_files=PLATE_DATA,
        barcodes=BARCODES,
        metadata=METADATA_DIR,
    output:
        aggregate_profile=AGGREGATE_DATA,
        cell_counts=CELL_COUNTS,
    log:
        "logs/aggregate_{file_name}.log",
    conda:
        "../envs/cytominer_env.yaml"
    params:
        aggregate_config=config["config_paths"]["single_cell"],
    script:
        "../scripts/aggregate_cells.py"

annotate.smk

rule annotate:
    input:
        aggregate_profile=AGGREGATE_DATA,
        barcodes=BARCODES,
        metadata=METADATA_DIR,
    output:
        ANNOTATED_DATA,
    conda:
        "../envs/cytominer_env.yaml"
    log:
        "logs/annotate_{file_name}.log",
    params:
        annotate_config=config["config_paths"]["annotate"],
    script:
        "../scripts/annotate.py"

normalize.smk

rule normalize:
    input:
        ANNOTATED_DATA,
    output:
        NORMALIZED_DATA,
    conda:
        "../envs/cytominer_env.yaml"
    log:
        "logs/normalized_{file_name}.log",
    params:
        normalize_config=config["config_paths"]["normalize"],
    script:
        "../scripts/normalize.py"

Now each components is individually, developers can import these modules to their workflows without any problem!

Another additional feature this PR introduces is the ability to inherit modules. Here’s an example: Let’s say we’re creating a new module but we also want this new module to create a tight couple with the normalization method.

This can be easily solved by inheriting the normalizatio.smk into your new rule module.

new_rule.smk

# lets inheret the normalization module 
include: `./normalization.smk`

rule new_rule:
     input:
        ANNOTATED_DATA,
    output:
        NORMALIZED_DATA,
    script:
         "../scripts/new_script.py"

the include is similar to python’s import call, as it imports the namespace into the new_rule.smk

This is beneficial because users do not have to write a new module or repeat code within the new_rule.smk

@axiomcura axiomcura requested a review from gwaybio March 27, 2023 21:57
Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really great, straightforward PR! 🎉

@axiomcura
Copy link
Member Author

Great, merging!

@axiomcura axiomcura merged commit 9612414 into WayScience:main Mar 27, 2023
@axiomcura axiomcura deleted the module-sep branch March 28, 2023 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants