Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for 0.3.2 #80

Merged
merged 15 commits into from
Feb 10, 2022
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- data/Data.py: Added `update_possible_rt_keys()` and `update_possible_ri_keys()` methods for `Data` class.
Now users can supply their own identifiers for RT/RI columns (csv files) and identifiers (msp) in their data. [#74](https://github.com/RECETOX/RIAssigner/pull/74)
- General: Added missing documentation and updated [README](README.md). [#80](https://github.com/RECETOX/RIAssigner/pull/80)
- setup.py: Added versions for dependencies. [#80](https://github.com/RECETOX/RIAssigner/pull/80)
### Changed
- data/Data.py: Unified default RT/RI column identifiers between `PandasData` and `MatchMSData` classes. [#74](https://github.com/RECETOX/RIAssigner/pull/74)
- data/MatchMSData.py: `MatchMSData` class now looks up for RT and RI identifiers from within default identifiers list
Expand Down
32 changes: 23 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,29 @@
[![bioconda package](https://img.shields.io/conda/v/bioconda/riassigner)](https://anaconda.org/bioconda/riassigner)

## Overview
RIAssigner is a python tool for retention index (RI) computation for GC-MS data developed at [RECETOX](https://www.recetox.muni.cz/en).
RIAssigner is a python tool for retention index (RI) computation for GC-MS data developed at [RECETOX](https://www.recetox.muni.cz/en) and hosted on [Galaxy](https://umsa.cerit-sc.cz/).

The [retention index](https://goldbook.iupac.org/terms/view/R05360) is a mapping of retention time, making the retention time of compounds on different columns comparable, i.e to compounds might have different retention times on different columns, but a very similar retention index. To compute this index, a set of reference compounds - often an inert alkane series - is analyzed as part of the batch (on the same column). The retention index of the alkanes are fixed (carbon number x 100) and any query compounds can be assigned a retention index depending on its retention time. This can be done via piecewise linear interpolation or other mathematical methods.
The [retention index](https://goldbook.iupac.org/terms/view/R05360) is a mapping of retention time, making the retention data of compounds comparable, i.e. two compounds might have different retention times in different experiments, but a very similar retention index.
To compute this index, a set of reference compounds - often an inert alkane series - is analyzed as part of the batch (on the same column).
The retention index of the alkanes are fixed (carbon number x 100) and any query compounds can be assigned a retention index depending on its retention time.
This can be done via piece wise linear interpolation or other mathematical methods.

## Installation
Installation is currently possible by creating the conda environment with `conda env create -f conda/environment-dev.yml` and then installing the package with `python -m pip install -e .`

Install via [bioconda](https://anaconda.org/bioconda/riassigner) using `conda install -c bioconda riassigner`
(1) From source by cloning the repository and then installing the package with `pip`.
```
git clone https://github.com/RECETOX/RIAssigner.git
cd RIAssigner
python -m pip install -e .
```
(2) Install via [bioconda](https://anaconda.org/bioconda/riassigner) in your existing evironment.
```
conda install -c bioconda riassigner
```

## Usage
RIAssigner can be used to read data from `.msp`, `.csv` and `.tsv` files using [matchms](https://github.com/matchms/matchms) and [pandas](https://pandas.pydata.org/) and to compute the retention indices for the data.
A reference list of retention indexed compounds (traditionally an Alkane series) with retention times is used to compute the RI for a query dataset of retention time values using the [van Den Dool and Kratz](https://doi.org/10.1016/S0021-9673(01)80947-X) method or by using [cubic spline based interpolation](https://doi.org/10.1021/ac50035a026).
A reference list of retention indexed compounds (traditionally an Alkane series) with retention times is used to compute the RI for a query dataset of retention time values using the [van Den Dool and Kratz](https://doi.org/10.1016/S0021-9673(01)80947-X) method or by using [cubic spline-based interpolation](https://doi.org/10.1021/ac50035a026).
### Example
```python
from RIAssigner.compute import Kovats
Expand All @@ -31,11 +42,14 @@ reference = MatchMSData("../tests/data/msp/Alkanes_20210325.msp", "msp", rt_unit
query.retention_indices = Kovats().compute(query, reference)
query.write("peaks_with_rt.csv")
```
For more details check out this [notebook](doc/example_usage.ipynb) or try this tool on [Galaxy](https://umsa.cerit-sc.cz/).
For more details check out this [notebook](doc/example_usage.ipynb).

## Developer Documentation
### Setup
Create your development environment using the provided [script](conda/environment-dev.yml) via conda to install all required dependencies, including linter and testing frameworks.
Create your development conda environment using the provided [file](conda/environment-dev.yml) to install all required dependencies, including linter and testing frameworks.
```
conda env create -f conda/environment-dev.yml
```

### Contributing
We appreciate contributions - feel free to open an issue on our repository, create your own fork, work on the problem and pose a PR.
Expand Down Expand Up @@ -65,13 +79,13 @@ classDiagram
+read(string filename)
+write(string filename)
+retention_times() List~float~
+retention_indices() List~int~
+retention_indices() List~float~
}


class ComputationMethod{
<<interface>>
+compute(Data query, Data reference) List~int~
+compute(Data query, Data reference) List~float~

}

Expand Down
2 changes: 0 additions & 2 deletions RIAssigner/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@

logging.getLogger(__name__).addHandler(logging.NullHandler())

__author__ = "Helge Hecht"
__email__ = 'helge.hecht@recetox.muni.cz'
__all__ = [
"__version__",
]
1 change: 0 additions & 1 deletion RIAssigner/__main__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import sys
from typing import Tuple
from RIAssigner.compute.ComputationMethod import ComputationMethod
import argparse

Expand Down
17 changes: 17 additions & 0 deletions RIAssigner/cli/LoadDataAction.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,27 @@


class LoadDataAction(argparse.Action):
"""Method to create a `Data` instance.
Inherits from `argparse.Action`.
"""
def __init__(self, option_strings, dest, **kwargs):
"""Constructor

Args:
option_strings (List[str]): See argparse.Action.
dest (str): See argparse.Action.
"""
super().__init__(option_strings, dest, **kwargs)

def __call__(self, parser, namespace, values, option_string=None):
"""Overloaded function from `argparse.Action` which is called upon invocation.

Args:
parser (argparse.ArgumentParser): Argument parser with args.
namespace: namespace object.
values (List[object]): Values passed as parameters to the Action.
option_strings (List[str]): See argparse.Action.
"""
filename = values[0]
filetype = values[1]
rt_unit = values[2]
Expand Down
9 changes: 9 additions & 0 deletions RIAssigner/compute/ComputationMethod.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ class ComputationMethod(ABC):

@abstractmethod
def compute(self, query: Data, reference: Data) -> List[float]:
"""Abstract method for RI computation

Args:
query (Data): Dataset for which to compute the RI
reference (Data): Dataset with retention times & retention index for reference.

Returns:
List[float]: Computed retention indices
"""
...

def _check_data_args(self, query, reference):
Expand Down
37 changes: 36 additions & 1 deletion RIAssigner/data/Data.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@ class Data(ABC):

@staticmethod
def is_valid(rt: RetentionTimeType) -> bool:
"""Determine whether a retention time value is valid

Args:
rt (RetentionTimeType): Value to check for validity.

Returns:
bool: State of validity (True/False).
"""
return rt is not None and rt >= 0.0

@classmethod
Expand Down Expand Up @@ -55,27 +63,54 @@ def __init__(self, filename: str, filetype: str, rt_unit: str):

@abstractmethod
def read(self):
"""Method to initialize internal data storage.
"""
...

@abstractmethod
def write(self, filename):
"""Store current content to disk.

Args:
filename (str): Path to output filename.
"""
...

@property
def filename(self):
def filename(self) -> str:
"""Getter for filename property.

Returns:
str: Filename of originally loaded data.
"""
return self._filename

@property
@abstractmethod
def retention_times(self) -> Iterable[RetentionTimeType]:
"""Getter for `retention_times` property.

Returns:
Iterable[RetentionTimeType]: RT values contained in data.
"""
...

@property
@abstractmethod
def retention_indices(self) -> Iterable[RetentionIndexType]:
"""Getter for `retention_indices` property.

Returns:
Iterable[RetentionIndexType]: RI values stored in data.
"""
...

@retention_indices.setter
@abstractmethod
def retention_indices(self, value: Iterable[RetentionIndexType]):
"""Setter for `retention_indices` variable.

Args:
value (Iterable[RetentionIndexType]): Values to assign to property.
"""
...
8 changes: 8 additions & 0 deletions RIAssigner/data/MatchMSData.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,14 @@ def _sort_spectra_by_rt(self):
self._spectra.sort(key=lambda spectrum: safe_read_key(spectrum, self._rt_key))

def __eq__(self, o: object) -> bool:
"""Comparison operator `==`.

Args:
o (object): Object to compare with.

Returns:
bool: State of equality.
"""
if not isinstance(o, MatchMSData):
return False
other: MatchMSData = o
Expand Down
13 changes: 13 additions & 0 deletions RIAssigner/data/PandasData.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,14 @@ def _sort_by_rt(self):
self._data.sort_values(by=self._rt_index, axis=0, inplace=True)

def __eq__(self, o: object) -> bool:
"""Comparison operator `==`.

Args:
o (object): Object to compare with.

Returns:
bool: State of equality.
"""
if not isinstance(o, PandasData):
return False
other: PandasData = o
Expand Down Expand Up @@ -97,4 +105,9 @@ def _ri_from_carbon_numbers(self):

@retention_indices.setter
def retention_indices(self, values: Iterable[int]):
"""Setter for `retention_indices` property.

Args:
values (Iterable[int]): Values to assign.
"""
self._data[self._ri_index] = values
7 changes: 4 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@
long_description=readme,
long_description_content_type="text/markdown",
author="Helge Hecht, Maksym Skoryk",
author_email="helge.hecht@recetox.muni.cz, 245816@muni.cz",
author_email="helge.hecht@recetox.muni.cz, maksym.skoryk@recetox.muni.cz",

maintainer="RECETOX",
maintainer_email="GalaxyToolsDevelopmentandDeployment@space.muni.cz",
url="https://github.com/RECETOX/RIAssigner",
Expand All @@ -46,10 +47,10 @@
test_suite="tests",
python_requires='>=3.7',
install_requires=[
"matchms",
"matchms>=0.9.1",
"numpy",
"pandas",
"pint",
"pint>=0.17",
"scipy"
],
extras_require={
Expand Down