-
Notifications
You must be signed in to change notification settings - Fork 0
Home
A repository to create automations that download external bioinformatics Datasets.
For each Bioinformatics database source, this package can be setup to run an automation that checks if a new version of the database is available. As soon as it detects a new release , the automation will download it locally.
For each database source,the automation creates a root directory that is the name of the database in the path defined by the standards - the path is set in the main Configuration (EXTERNAL_DATA_BASE). The organization of files under these root directories will depend on the way a given data source publishes its data.
Under data source root directory, you will find:
- A file (current_release_NUMBER) that stores the latest release of the data source
- A directory for each version downloaded
- A symbolic "current" that points to the latest version
Under data source root directory, the files will be stored by datasets or as specified in variables DATASETS, or/and TAXA in the data source configuration file
This was tested on Linux and Mac OS environments
The main dependency of this package is wget utilities - but if you want to untar downloaded datasets then make sure tar, unzip, and gunzip utilities are installed as well.
- wget
- tar
- unzip
- gunzip
To verify that these software are installed, run the following commands:
To check wget install, run: which wget
To check tar install, run: which tar
To check unzip install, run: which unzip
To check gunzip install, run: which gunzip
Each source is a sub-directory that contains:
The name of each source is all in lowercase and matches the name of the download root directory of the source. Different versions of each source - where applicable - are downloaded under the same root directory and the name of the root directory is the same as the source's name.
In addition to data source directories, the following are found under the package's root.