Skip to content

Releases: iquasere/UPIMAPI

Simplified database download

29 Dec 13:07
Compare
Choose a tag to compare

When inputting a database, there are three options:

  1. input one of three reserved values: uniprot, swissprot or taxids
  2. input a FASTA database
  3. input a DIAMOND formatted database

UPIMAPI will first check if the DIAMOND version of the databases exist, and if it finds it, will run annotation with it.

  1. in --resources-directory folder, either uniprot.dmnd, uniprot_sprot.dmnd or taxids_database.dmnd
  2. the database filename with termination replaced with .dmnd
  3. the database filename itself

If that doesn't exist, UPIMAPI will search for the FASTA format, and if it finds it, will convert to DIAMOND format.

  1. in --resources-directory folder, either uniprot.fasta, uniprot_sprot.fasta or taxids_database.fasta
  2. the database filename itself
  3. will exit with file not found error

This removes the need to tinker with the --skip-db-check parameter, but more trust is placed on the user.

Sanitization of mapping columns

19 Dec 12:07
Compare
Choose a tag to compare

Wrong columns can no longer be inputted

Now UPIMAPI will report an error and exit with a code different from 0.

New command for showing available fields

upimapi --show available-fields will print the columns available for ID mapping. Properly capitalized, directly extracted from the return fields page.

Fixed parsing of custom inputted "-cols"

21 Nov 15:47
Compare
Choose a tag to compare

In handling the columns Organism, Organism (ID), Taxonomic lineage and Taxonomic lineage IDs, when some of Taxonomic lineage (LEVEL) or Taxonomic lineage IDs (LEVEL) columns are specified.

UPIMAPI now properly adds and discards columns through its execution, obeying the respective conditions.

Also, UPIMAPI now detects if input ends in a compressed format, i.e., if an input file is specified and ends with .zip, .tar, .gz or .bz2, UPIMAPI will stop executing and will exit.

Fixed handling taxonomic columns

27 Sep 15:18
Compare
Choose a tag to compare

Columns were not being parsed correctly. Repeated columns were being outputted, i.e., Taxonomic lineage (SPECIES) and Taxonomic lineage IDs (SPECIES).

Also simplified repo structure extensively, put all into cicd folder.

Sorted the input of taxonomic columns

26 Sep 13:49
Compare
Choose a tag to compare

Specifying taxonomic columns (e.g., Taxonomic lineage (SPECIES), Taxonomic lineage IDs (SUPERKINGDOM)) was always outputting the columns Taxonomic lineage and Taxonomic lineage (Ids).

These columns are no longer outputted if not called for.

Also, several fixes

Fixed outputting taxonomy with extra space (e.g. Bacteria -> Bacteria).
Fixed case where no additional IDs are mapped, it was throwing error.
Fixed case where no columns are inputted.
Fixed getting fasta - request was badly formatted.

From/To ID mapping implemented

07 Jul 12:55
Compare
Choose a tag to compare

Implemented the ID mapping available at https://www.uniprot.org/id-mapping triggered when "From database" and "To database" are different to the default values - "UniProtKB AC/ID" and "UniProtKB".

Two new parameters: --from-db and --to-db. Possible values for these can be consulted by consulting the information at https://rest.uniprot.org/configure/idmapping/fields
They can also be checked on by inputting a wrong value to the parameter. Possible options will show up.

UPIMAPI will end execution after performing this new ID mapping. It can't be combined with the ID mapping that obtains columns of information from UniProt.

Re-added pyyaml as dependency, as api_info is now obtained again, and used directly.

Columns outputted in order of input

22 Jun 10:26
Compare
Choose a tag to compare

Columns were being outputted in random orders, because of set commands among the code of UPIMAPI.

Columns are now properly outputted in the order that they are specified by input of the user.

Fix on default memory

05 Jun 09:11
Compare
Choose a tag to compare

When memory is inputted with --max-memory, UPIMAPI assumes it comes as Gb.

Default in UPIMAPI (when not explicitly inputting) was cheking for available memory, which comes in bytes. This lead to values in memory too large, that lead to values of block-size too small, and the reference database would be split in too many blocks. Then, UPIMAPI/DIAMOND would take forever.

Now, UPIMAPI parses default memory to Gb before determining block-size and number-of-chunks.

Important and nice options for homology search

01 Jun 08:58
Compare
Choose a tag to compare

Added control over DIAMOND search

--diamond-mode accepts six options (by decreasing search time and increasing sensibility): fast, mid_sensitive, sensitive, more_sensitive, very_sensitive and ultra_sensitive.
Helps to dramatically decrease search times, but also reduce memory usage and apparently disk usage as well (no ideia why this one).

Added parameter for max memory

Set with --max-memory, read as float in Gb.
Allows to calculate DIAMOND parameters b and c automatically.

Also two small bug fixes

Fixed the case where database was inputted with --skip-db-check and as a FASTA file - UPIMAPI would input the FASTA database directly to DIAMOND.
Fixed outputting days as float. Days don't float.

Added selection of mirror to download UniProt from

15 May 16:03
Compare
Choose a tag to compare

New parameter --mirror to determine where to download UniProt. It allows the following options: