You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To enable local mode and podp mode, the pipeline of arranging data is redesigned. The pipeline will be implemented in a new class DatasetArranger.
flowchart LR
ConfigError[Dynaconf config validation error]
DataError[Data validation error]
UseIt[Use the data]
Download[Download or generate, and remove existing data if relevent]
A[GNPS, antiSMASH and BigSCape] --> B{Pass Dynaconf config validation?}
B -->|No | ConfigError
B -->|Yes| G{Is the mode PODP?}
G -->|No, local mode| G1{Does local data dir exist?}
G1 -->|No | DataError
G1 -->|Yes| H{Pass data validation?}
H --> |No | DataError
H --> |Yes| UseIt
G -->|Yes, podp mode| G2{Does local data dir exist?}
G2 --> |No | Download
G2 --> |Yes | J{Pass data validation?}
J -->|No | Download --> |try max 2 times| J
J -->|Yes| UseIt
Mibig[mibig - always download, users not allowed to provide local data] --> M0{Pass Dynaconf config validation?}
M0 -->|No | M01[Dynaconf config validation error]
M0 -->|Yes | MibigDownload[Remove existing data if applicable and download data]
podp[PODP project json file] --> P{Does the file exist?}
P --> |No | P0[Download the file] --> P1
P --> |Yes| P1[Validate the file]
StrainMappings[Strain mappings file - required] --> SM{Is the mode PODP?}
SM --> |No |SM0[Validate the file]
SM --> |Yes|SM1[Generate the file] --> SM0
StrainsSelected[Strains selected file - optional] --> SS[Validate the file if it exists]
Loading
The text was updated successfully, but these errors were encountered:
This is a big PR to implement the pipelines of data arranging, which enables the local and podp modes.
Arranging data means
- creating data folders in the `root_dir`
- downloading dataset if needed (e.g. for podp mode)
- validating dataset downloaded or provided by users
Basically, it means all steps needed to make data ready for loading.
The pipelines of arranging data for different types of data are displayed in the diagram of #117.
To keep the data arranging workflow simple, we use fixed project directory structure (see #163) with fixed dir and file names (see `globals.py`).
To use nplinker, users are required to
- create a `root_dir` manually and use it as the root directory of the nplinker project
- provide a config file `nplinker.toml` and put it in the `root_dir`
**Major changes**
- Added file `arranger.py` including the class `DatasetArranger ` and some validation functions, which implement the pipelines of arranging data
- Clean/remove/update some files to make the arrangers work (some may need further refactoring in future PRs)
- cleaned `runbigscape.py`
- Deleted `downloader.py` and its tests, which is replaced by `DatasetArranger`
- Updated `loader.py` and `nplinker.py` to use the `DatasetArranger`
- Added integration tests for the arranger (tests passed)
- Created `nplinker_local_mode.toml`
- Updated `tests/conftest.py`
- Updated `test_nplinker_local.py` to test the `local mode`
Tests on podp mode also passed on my local machine. Due to the cost of running bigscape, the tests will be added to the codebase in next PRs.
To enable local mode and podp mode, the pipeline of arranging data is redesigned. The pipeline will be implemented in a new class
DatasetArranger
.The text was updated successfully, but these errors were encountered: