Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool for GC annotation #536

Merged
merged 11 commits into from
May 21, 2024
12 changes: 12 additions & 0 deletions tools/rename_annotated_feature/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
name: rename_annotated_feature
owner: recetox
description: Update column names in an abundance table using a annotation table with spectral matching results
homepage_url: https://github.com/RECETOX/galaxytools/
long_description: |
Renames column features in the abundance table based on corresponding annotations.
It operates in two modes: 'single' and 'multiple'. In 'single' mode, it selects the
highest scoring match for renaming, while in 'multiple' mode, it includes all matches with scores.
categories:
- Metabolomics
remote_repository_url: "https://github.com/RECETOX/galaxytools/tree/master/tools/rename_annotated_feature"
type: unrestricted
26 changes: 26 additions & 0 deletions tools/rename_annotated_feature/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<macros>
<token name="@TOOL_VERSION@">1.0.0</token>
<xml name="creator">
<creator>
<person
givenName="Wudmir"
familyName="Rojas"
url="https://github.com/wverastegui"
identifier="0000-0001-7036-9987" />
<person
givenName="Helge"
familyName="Hecht"
url="https://github.com/hechth"
identifier="0000-0001-6744-996X" />
<organization
url="https://www.recetox.muni.cz/"
email="GalaxyToolsDevelopmentandDeployment@space.muni.cz"
name="RECETOX MUNI"/>
</creator>
</xml>
<token name="@HELP@"><![CDATA[
This tool uses an annotations table to rename columns in an abundance table. It accepts paths to these tables,
a renaming mode ("single" or "multiple"), and an output path for the CSV result. 'Single' mode renames based on
the highest match, 'multiple' mode renames based on all matches.
]]></token>
</macros>
94 changes: 94 additions & 0 deletions tools/rename_annotated_feature/rename_annotated_feature.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
import argparse
from collections import defaultdict
from typing import Tuple

import pandas as pd


def parse_arguments() -> argparse.Namespace:
"""Parses command-line arguments.

Returns:
argparse.Namespace: Namespace with argument values as attributes.
"""
parser = argparse.ArgumentParser(description='Rename annotated feature.')
parser.add_argument('--annotations_table_path', type=str, required=True, help='Path to the annotations table file.')
parser.add_argument('--abundance_table_path', type=str, required=True, help='Path to the abundance table file.')
parser.add_argument('--mode', type=str, choices=['single', 'multiple'], default='single', help='Mode to use for renaming. Can be "single" or "multiple".')
parser.add_argument('--output_path', type=str, default='output.csv', help='Path to the output CSV file.')
return parser.parse_args()


def load_tables(annotations_table_path: str, abundance_table_path: str) -> Tuple[pd.DataFrame, pd.DataFrame]:
"""Loads annotation and abundance tables from files.

Args:
annotations_table_path (str): Path to the annotations table file.
abundance_table_path (str): Path to the abundance table file.

Returns:
Tuple[pd.DataFrame, pd.DataFrame]: Tuple of DataFrames for annotations and abundance tables.
"""
annotations_table = pd.read_table(annotations_table_path)
abundance_table = pd.read_table(abundance_table_path)

annotations_table.columns = annotations_table.columns.str.strip()
abundance_table.columns = abundance_table.columns.str.strip()

return annotations_table, abundance_table


def rename_single(annotations_table: pd.DataFrame, abundance_table: pd.DataFrame) -> None:
"""Renames columns in abundance table based on single best match in annotations table.

Args:
annotations_table (pd.DataFrame): DataFrame of annotations.
abundance_table (pd.DataFrame): DataFrame of abundance data.
"""
scores_col = annotations_table.columns[-1]
ref_idxs = annotations_table.groupby("query")[scores_col].idxmax()
results = annotations_table.loc[ref_idxs]

queries = results["query"]
refs = results["reference"]

mapping = dict(zip(queries, refs))
abundance_table.rename(columns=mapping, inplace=True)


def rename_multiple(annotations_table: pd.DataFrame, abundance_table: pd.DataFrame) -> None:
"""Renames columns in abundance table based on multiple matches in annotations table.

Args:
annotations_table (pd.DataFrame): DataFrame of annotations.
abundance_table (pd.DataFrame): DataFrame of abundance data.
"""
queries = annotations_table["query"]
refs = annotations_table["reference"]

mapping = defaultdict(list)
for query, ref in zip(queries, refs):
mapping[query].append(ref)

for query, refs in mapping.items():
new_column_name = ', '.join(refs)
if query in abundance_table.columns:
abundance_table.rename(columns={query: new_column_name}, inplace=True)


def main() -> None:
"""Main function to parse arguments, load tables, rename columns, and save output."""
args = parse_arguments()

annotations_table, abundance_table = load_tables(args.annotations_table_path, args.abundance_table_path)

if args.mode == "single":
rename_single(annotations_table, abundance_table)
else:
rename_multiple(annotations_table, abundance_table)

abundance_table.to_csv(args.output_path, sep="\t", index=False)


if __name__ == "__main__":
main()
53 changes: 53 additions & 0 deletions tools/rename_annotated_feature/rename_annotated_feature.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
<tool id="rename_annotated_feature" name="Rename Annotated Feature" version="@TOOL_VERSION@+galaxy0" profile="21.09">
<description>Rename columns in abundance table based on annotations table</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="creator"/>
<requirements>
<requirement type="package" version="2.2.1">pandas</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
python3 '${__tool_directory__}/rename_annotated_feature.py'
--annotations_table_path '$annotations_table_path'
--abundance_table_path '$abundance_table_path'
--mode '$mode'
--output_path '$output_path'
]]></command>
<inputs>
<param name="annotations_table_path" type="data" format="tabular" label="Annotations table file" help="Path to the annotations table file."/>
<param name="abundance_table_path" type="data" format="tabular" label="Abundance table file" help="Path to the abundance table file."/>
<param name="mode" type="select" label="Mode to use for renaming" help="Can be single or multiple.">
<option value="single" selected="true">Single</option>
<option value="multiple">Multiple</option>
</param>
</inputs>
<outputs>
<data name="output_path" format="tabular" label="Renamed abundance table"/>
</outputs>
<tests>
<test>
<param name="annotations_table_path" value="annotated_table.tsv" ftype="tabular"/>
<param name="abundance_table_path" value="abundance_table.tsv" ftype="tabular"/>
<param name="mode" value="single"/>
<output name="output_path" file="single_mode_output.tsv"/>
</test>
<test>
<param name="annotations_table_path" value="annotated_table.tsv" ftype="tabular"/>
<param name="abundance_table_path" value="abundance_table.tsv" ftype="tabular" />
<param name="mode" value="multiple"/>
<output name="output_path" file="multi_mode_output.tsv"/>
</test>
</tests>
<help>
<![CDATA[
@HELP@
]]>
</help>
<!-- Update to the correct citation for this tool -->
<citations>
<citation type="doi">10.5281/zenodo.7178586</citation>
<citation type="doi">10.21105/joss.02411</citation>
<citation type="doi">10.1021/ac501530d</citation>
</citations>
</tool>
Loading