Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PODP mode and local data mode #117

Closed
Tracked by #163
CunliangGeng opened this issue Feb 23, 2023 · 0 comments · Fixed by #215
Closed
Tracked by #163

PODP mode and local data mode #117

CunliangGeng opened this issue Feb 23, 2023 · 0 comments · Fixed by #215
Assignees
Labels
GEN genomics related issues MET metabolomics related issues

Comments

@CunliangGeng
Copy link
Member

CunliangGeng commented Feb 23, 2023

To enable local mode and podp mode, the pipeline of arranging data is redesigned. The pipeline will be implemented in a new class DatasetArranger.

flowchart LR

    ConfigError[Dynaconf config validation error]
    DataError[Data validation error]
    UseIt[Use the data]
    Download[Download or generate, and remove existing data if relevent]

    A[GNPS, antiSMASH and BigSCape] --> B{Pass Dynaconf config validation?}
    B -->|No | ConfigError
    
    B -->|Yes| G{Is the mode PODP?}
    
    G -->|No, local mode| G1{Does local data dir exist?}
    G1 -->|No | DataError
    G1 -->|Yes| H{Pass data validation?}
    H --> |No | DataError
    H --> |Yes| UseIt 

    G -->|Yes, podp mode| G2{Does local data dir exist?}
    G2 --> |No | Download
    G2 --> |Yes | J{Pass data validation?}
    J -->|No | Download --> |try max 2 times| J
    J -->|Yes| UseIt

 
    Mibig[mibig - always download, users not allowed to provide local data] --> M0{Pass Dynaconf config validation?}
    M0 -->|No | M01[Dynaconf config validation error]
    M0 -->|Yes | MibigDownload[Remove existing data if applicable and download data]


    podp[PODP project json file] --> P{Does the file exist?}
    P --> |No | P0[Download the file] --> P1
    P --> |Yes| P1[Validate the file]

    StrainMappings[Strain mappings file - required] --> SM{Is the mode PODP?}
    SM --> |No |SM0[Validate the file]
    SM --> |Yes|SM1[Generate the file] --> SM0

    StrainsSelected[Strains selected file - optional] --> SS[Validate the file if it exists]
Loading
@CunliangGeng CunliangGeng added GEN genomics related issues MET metabolomics related issues labels Feb 23, 2023
@CunliangGeng CunliangGeng added this to the refactor codebase milestone Feb 23, 2023
@CunliangGeng CunliangGeng changed the title Refactor loader.py PODP mode and local data mode Dec 22, 2023
@CunliangGeng CunliangGeng self-assigned this Feb 27, 2024
CunliangGeng added a commit that referenced this issue Mar 5, 2024
This is a big PR to implement the pipelines of data arranging, which enables the local and podp modes.

Arranging data means
- creating data folders in the `root_dir`
-  downloading dataset if needed (e.g. for podp mode)
-  validating dataset downloaded or provided by users

Basically, it means all steps needed to make data ready for loading. 

The pipelines of arranging data for different types of data are displayed in the diagram of #117.

To keep the data arranging workflow simple, we use fixed project directory structure (see #163) with fixed dir and file names (see `globals.py`).

To use nplinker, users are required to
- create a `root_dir` manually and use it as the root directory of the nplinker project
- provide a config file `nplinker.toml` and put it in the `root_dir` 

**Major changes**
- Added file `arranger.py` including the class `DatasetArranger ` and some validation functions, which implement the pipelines of arranging data

- Clean/remove/update some files to make the arrangers work (some may need further refactoring in future PRs)
    -  cleaned `runbigscape.py`
    - Deleted `downloader.py` and its tests, which is replaced by `DatasetArranger`
    - Updated `loader.py` and `nplinker.py` to use the `DatasetArranger`

- Added integration tests for the arranger (tests passed)
  - Created `nplinker_local_mode.toml`
  - Updated `tests/conftest.py` 
   - Updated `test_nplinker_local.py` to test the `local mode` 
 

Tests on podp mode also passed on my local machine. Due to the cost of running bigscape, the tests will be added to the codebase in next PRs.
@CunliangGeng CunliangGeng linked a pull request Mar 5, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GEN genomics related issues MET metabolomics related issues
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant