PODP mode and local data mode #117

CunliangGeng · 2023-02-23T09:56:39Z

To enable local mode and podp mode, the pipeline of arranging data is redesigned. The pipeline will be implemented in a new class DatasetArranger.

flowchart LR

    ConfigError[Dynaconf config validation error]
    DataError[Data validation error]
    UseIt[Use the data]
    Download[Download or generate, and remove existing data if relevent]

    A[GNPS, antiSMASH and BigSCape] --> B{Pass Dynaconf config validation?}
    B -->|No | ConfigError
    
    B -->|Yes| G{Is the mode PODP?}
    
    G -->|No, local mode| G1{Does local data dir exist?}
    G1 -->|No | DataError
    G1 -->|Yes| H{Pass data validation?}
    H --> |No | DataError
    H --> |Yes| UseIt 

    G -->|Yes, podp mode| G2{Does local data dir exist?}
    G2 --> |No | Download
    G2 --> |Yes | J{Pass data validation?}
    J -->|No | Download --> |try max 2 times| J
    J -->|Yes| UseIt

 
    Mibig[mibig - always download, users not allowed to provide local data] --> M0{Pass Dynaconf config validation?}
    M0 -->|No | M01[Dynaconf config validation error]
    M0 -->|Yes | MibigDownload[Remove existing data if applicable and download data]


    podp[PODP project json file] --> P{Does the file exist?}
    P --> |No | P0[Download the file] --> P1
    P --> |Yes| P1[Validate the file]

    StrainMappings[Strain mappings file - required] --> SM{Is the mode PODP?}
    SM --> |No |SM0[Validate the file]
    SM --> |Yes|SM1[Generate the file] --> SM0

    StrainsSelected[Strains selected file - optional] --> SS[Validate the file if it exists]

The text was updated successfully, but these errors were encountered:

This is a big PR to implement the pipelines of data arranging, which enables the local and podp modes. Arranging data means - creating data folders in the `root_dir` - downloading dataset if needed (e.g. for podp mode) - validating dataset downloaded or provided by users Basically, it means all steps needed to make data ready for loading. The pipelines of arranging data for different types of data are displayed in the diagram of #117. To keep the data arranging workflow simple, we use fixed project directory structure (see #163) with fixed dir and file names (see `globals.py`). To use nplinker, users are required to - create a `root_dir` manually and use it as the root directory of the nplinker project - provide a config file `nplinker.toml` and put it in the `root_dir` **Major changes** - Added file `arranger.py` including the class `DatasetArranger ` and some validation functions, which implement the pipelines of arranging data - Clean/remove/update some files to make the arrangers work (some may need further refactoring in future PRs) - cleaned `runbigscape.py` - Deleted `downloader.py` and its tests, which is replaced by `DatasetArranger` - Updated `loader.py` and `nplinker.py` to use the `DatasetArranger` - Added integration tests for the arranger (tests passed) - Created `nplinker_local_mode.toml` - Updated `tests/conftest.py` - Updated `test_nplinker_local.py` to test the `local mode` Tests on podp mode also passed on my local machine. Due to the cost of running bigscape, the tests will be added to the codebase in next PRs.

CunliangGeng added GEN genomics related issues MET metabolomics related issues labels Feb 23, 2023

CunliangGeng added this to the refactor codebase milestone Feb 23, 2023

CunliangGeng changed the title ~~Refactor loader.py~~ PODP mode and local data mode Dec 22, 2023

CunliangGeng mentioned this issue Feb 23, 2024

Refactor initialisation of project root and data folders [Track issue] #163

Closed

12 tasks

CunliangGeng self-assigned this Feb 27, 2024

CunliangGeng mentioned this issue Mar 1, 2024

add class DatasetArranger #215

Merged

CunliangGeng linked a pull request Mar 5, 2024 that will close this issue

add class DatasetArranger #215

Merged

CunliangGeng closed this as completed Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PODP mode and local data mode #117

PODP mode and local data mode #117

CunliangGeng commented Feb 23, 2023 •

edited

Loading

PODP mode and local data mode #117

PODP mode and local data mode #117

Comments

CunliangGeng commented Feb 23, 2023 • edited Loading

CunliangGeng commented Feb 23, 2023 •

edited

Loading