Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation and docstrings, improve DIA-NN parsing, refactoring #378

Merged
merged 50 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
be3f18c
fix posixpath error in maxquant parameter parsing file
SamvPy Sep 4, 2024
05e6d2a
add documentation
Cajac102 Sep 4, 2024
6f16282
add documentation, uncomment error handling
Cajac102 Sep 4, 2024
819125c
add new parameter fields and refactor diann parameter parsing
SamvPy Sep 3, 2024
225ed78
all_datapoints now optional argument, added documentation
Cajac102 Sep 4, 2024
f38966a
adapt calls to data point functions
Cajac102 Sep 4, 2024
94dcf8d
Fix DIA-NN parsing, add module_dia_quant test
rodvrees Sep 4, 2024
f3fc148
undo changes to parse_settings_fragpipe.toml
rodvrees Sep 4, 2024
31eea5e
Remove debugging statements
rodvrees Sep 4, 2024
01fca23
Add further tests for module_dia_quant
rodvrees Sep 4, 2024
76f080a
Add Datapoint constructor unittest
rodvrees Sep 5, 2024
0ffe509
bugfix for DIA quant page
Cajac102 Sep 5, 2024
9e6bd57
add documentation to base quant
Cajac102 Sep 5, 2024
37f0de9
Merge branch 'main' into DIA
RobbinBouwmeester Sep 5, 2024
5e50667
DIA-NN support
rodvrees Sep 5, 2024
48cd756
add documentation
Cajac102 Sep 5, 2024
e47e5cc
fix identation
Cajac102 Sep 5, 2024
b658ca9
Merge branch 'main' into DIA
RobbinBouwmeester Sep 5, 2024
1e89222
adapt plotquant import
Cajac102 Sep 5, 2024
46dd487
Add DIA_quant_peptidoform page, make separate custom parse files and …
Alirezak2n Sep 5, 2024
cc6e8a7
reformatting
Cajac102 Sep 5, 2024
f11c571
Fix placeholder_download bug
Alirezak2n Sep 5, 2024
67e5985
Merge branch 'DIA' of https://github.com/Proteobench/ProteoBench into…
Alirezak2n Sep 5, 2024
f5265ec
Fix black
RobbinBouwmeester Sep 5, 2024
1adb2ce
Merge branch 'DIA' of https://github.com/Proteobench/ProteoBench into…
RobbinBouwmeester Sep 5, 2024
5bbe462
Merge branch 'main' into DIA
RobbinBouwmeester Sep 5, 2024
3ce21c1
Merge branch 'main' into DIA
RobbinBouwmeester Sep 5, 2024
bdec42a
AlphaDIA support
rodvrees Sep 5, 2024
5c6dd5e
Update parse_settings_ion.py
Alirezak2n Sep 5, 2024
13cf3eb
Update parse_ion.py
Alirezak2n Sep 5, 2024
d3b85ca
Update parse_ion.py
Alirezak2n Sep 5, 2024
9cb8e08
Update parse_ion.py
Alirezak2n Sep 5, 2024
b758625
Update parse_ion.py
Alirezak2n Sep 5, 2024
f037389
Merge branch 'main' into DIA
RobbinBouwmeester Sep 5, 2024
fc7a684
AlphaDIA support
rodvrees Sep 5, 2024
61f02c9
remove unused ModuleInterface class
Cajac102 Sep 5, 2024
696b688
add documentation
Cajac102 Sep 5, 2024
de76f14
remove abstract moduleInterface class, add documentation
Cajac102 Sep 5, 2024
e96627d
remove abstract Interface class
Cajac102 Sep 5, 2024
a980ff6
black
Cajac102 Sep 5, 2024
de38fd8
Undo debug statements, formatting
rodvrees Sep 5, 2024
db492cb
undo debug statements
rodvrees Sep 5, 2024
4a294be
Fix AlphaDIA contaminant detection
rodvrees Sep 5, 2024
a17102c
Update contributions
rodvrees Sep 5, 2024
c20b263
Merge branch 'main' into DIA
RobbinBouwmeester Sep 5, 2024
fc5f601
add alphadia parameter parsing and edit param parsing test files to n…
SamvPy Sep 3, 2024
592d24c
Merge branch 'main' into DIA
RobbinBouwmeester Sep 5, 2024
5d7a39b
fix maxquant param parsing tests
SamvPy Sep 4, 2024
3d841cb
MaxDIA support
rodvrees Sep 5, 2024
d0e7d6b
black formatting
SamvPy Sep 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,11 @@ People who contributed to ProteoBench (in alphabetical order)
*VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium*
*Department of Biomolecular Medicine, UGent, Ghent, Belgium*

.. line-block::
**Robbe Devreese**
*VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium*
*Department of Biomolecular Medicine, UGent, Ghent, Belgium*

.. line-block::
**Nadezhda T. Doncheva**
*Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark*
Expand All @@ -90,6 +95,11 @@ People who contributed to ProteoBench (in alphabetical order)
*VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium*
*Department of Biomolecular Medicine, UGent, Ghent, Belgium*

.. line-block::
**Caroline Jachmann**
*VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium*
*Department of Biomolecular Medicine, UGent, Ghent, Belgium*

.. line-block::
**Vedran Kasalica**
*Netherlands eScience Center, Science Park 402, 1098 XH, Amsterdam, The Netherlands*
Expand All @@ -112,6 +122,11 @@ People who contributed to ProteoBench (in alphabetical order)
*Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III - Paul Sabatier (UT3), Toulouse, France*
*Infrastructure nationale de protéomique, ProFI, FR 2048, Toulouse, France*

.. line-block::
**Alireza Nameni**
*VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium*
*Department of Biomolecular Medicine, UGent, Ghent, Belgium*

.. line-block::
**Martin Rykær**
*Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark*
Expand All @@ -125,6 +140,11 @@ People who contributed to ProteoBench (in alphabetical order)
*Department of Biomolecular Medicine, Ghent University, Ghent, Belgium*
*VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium*

.. line-block::
**Sam van Puyenbroeck**
*VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium*
*Department of Biomolecular Medicine, UGent, Ghent, Belgium*

.. line-block::
**Bart Van Puyvelde**
*ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, Belgium*
Expand Down
16 changes: 16 additions & 0 deletions proteobench/io/params/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,17 @@ class ProteoBenchParameters:
Minimum precursor charge allowed.
max_precursor_charge : Optional[int]
Maximum precursor charge allowed.
spectral_library_generation : Optional[dict]
Models used to generate spectral library (DIA-specific).
scan_window : Optional[int]
Scan window radius. Ideally corresponds to approximate
average number of data points per peak (DIA-specific).
quantification_method_DIANN : Optional[str]
Quantification strategy used in the DIA-NN engine (DIANN-specific).
second_pass : Optional[bool]
Whether second pass search is enabled (DIANN-specific).
protein_inference : Optional[str]
Protein inference method used.
"""

software_name: Optional[str] = None
Expand All @@ -77,3 +88,8 @@ class ProteoBenchParameters:
max_mods: Optional[int] = None # max_num_modifications
min_precursor_charge: Optional[int] = None # precursor_charge
max_precursor_charge: Optional[int] = None
scan_window: Optional[int] = None # DIA-specific
quantification_method_DIANN: Optional[str] = None # DIANN-specific
second_pass: Optional[bool] = None # DIANN specific
protein_inference: Optional[str] = None
predictors_library: Optional[dict] = None
183 changes: 183 additions & 0 deletions proteobench/io/params/alphadia.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
import re
from proteobench.io.params import ProteoBenchParameters
import pathlib
import pandas as pd

levels = [0, 1, 5, 9, 13, 17]

ANSI_REGEX = re.compile(r"(\x9B|\x1B\[)[0-?]*[ -\/]*[@-~]")


def parse_line(line):
# Remove the info part and convert ansi
line = ANSI_REGEX.sub("", line[22:].strip())
# Split the string to tab part and setting part
tab, setting = line.split("──")
setting_list = setting.split(":")
if len(setting_list) == 1:
setting_dict = {setting_list[0]: None}
else:
setting_dict = {setting_list[0]: setting_list[1]}
# Convert tab to level
level = levels.index(len(tab))

# Return header, parsed setting, and the level
return setting_list[0], setting_dict, level


def parse_section(
line,
line_generator,
):
section = {}

# Parse the line (both level and dictionary)
header_prev, line_dict, level_prev = line
section.update(line_dict)

try:
# Get the next line to know what to do
next_line = next(line_generator)
header_next, line_dict_next, level_next = parse_line(next_line)
except:
# If no lines left, go up a level, returning the sectino so far
return section, 0, None

while True:
# If no more lines go up a level
try:
header_next, line_dict_next, level_next = parse_line(next_line)
except:
break

# If the next line is start of new section again
if level_next > level_prev:
# Get the subsection

subsection, _, next_line = parse_section(
line=parse_line(next_line),
line_generator=line_generator,
)
# Add this subsection to new section
# A new line is already outputted so continue
section[header_prev] = subsection
continue

# if new line is at same level
elif level_prev == level_next:
section.update(line_dict_next)
header_prev = header_next
level_prev = level_next
try:
next_line = next(line_generator)
except:
break

# The next line needs to go up and output the section
# Also the new line should be returned
else:
return section, level_next, next_line

return section, 0, None


def extract_file_version(line):
# Regex pattern to extract the version number
version_pattern = r"version:\s*([\d\.]+)"

# Search for the version number in the line
match = re.search(version_pattern, line)

# Extract and print the version number if found
version = match.group(1) if match else None
return version


def add_fdr_parameters(parameter_dict, parsed_settings):
fdr_value = float(parsed_settings["fdr"]["fdr"])
fdr_level = parsed_settings["fdr"]["group_level"].strip()

level_mapping = {"proteins": "ident_fdr_protein"}
fdr_parameters = {"ident_fdr_psm": None, "ident_fdr_peptide": None, "ident_fdr_protein": None}
fdr_parameters[level_mapping[fdr_level]] = fdr_value
parameter_dict.update(fdr_parameters)


def get_min_max(list_of_elements):
if "(user defined)" in list_of_elements[1]:
min_value = int(list_of_elements[1].replace("(user defined)", ""))
if len(list_of_elements) == 4:
max_value = int(list_of_elements[3].replace("(user defined)", ""))
else:
max_value = int(list_of_elements[2])
else:
min_value = int(list_of_elements[0])
if len(list_of_elements) == 3:
max_value = int(list_of_elements[2].replace("(user defined)", ""))
else:
max_value = int(list_of_elements[1])
return min_value, max_value


def extract_params(fname):
with open(fname) as f:
lines_read = f.readlines()
lines = [line for line in lines_read if "──" in line]

version = extract_file_version(lines_read[6])

line_generator = iter(lines)
first_line = next(line_generator)

parsed_settings, level, line = parse_section(line=parse_line(first_line), line_generator=line_generator)

peptide_lengths = get_min_max(list(parsed_settings["library_prediction"]["precursor_len"].keys()))
precursor_charges = get_min_max(list(parsed_settings["library_prediction"]["precursor_charge"].keys()))

if "(user defined)" in parsed_settings["search"]["target_ms1_tolerance"]:
prec_tol = float(parsed_settings["search"]["target_ms1_tolerance"].replace("(user defined)", ""))
else:
prec_tol = float(parsed_settings["search"]["target_ms1_tolerance"])
if "(user defined)" in parsed_settings["search"]["target_ms2_tolerance"]:
frag_tol = float(parsed_settings["search"]["target_ms2_tolerance"].replace("(user defined)", ""))
else:
frag_tol = float(parsed_settings["search"]["target_ms2_tolerance"])

parameters = {
"software_name": "AlphaDIA",
"search_engine": "AlphaDIA",
"software_version": version,
"search_engine_version": version,
"enable_match_between_runs": "?",
"precursor_mass_tolerance": prec_tol,
"fragment_mass_tolerance": frag_tol,
"enzyme": parsed_settings["library_prediction"]["enzyme"].strip(),
"allowed_miscleavages": int(parsed_settings["library_prediction"]["missed_cleavages"]),
"min_peptide_length": peptide_lengths[0],
"max_peptide_length": peptide_lengths[1],
"min_precursor_charge": precursor_charges[0],
"max_precursor_charge": precursor_charges[1],
"fixed_mods": parsed_settings["library_prediction"]["fixed_modifications"].strip(),
"variable_mods": parsed_settings["library_prediction"]["variable_modifications"].strip(),
"max_mods": int(parsed_settings["library_prediction"]["max_var_mod_num"].replace("(user defined)", "")),
"scan_window": int(parsed_settings["selection_config"]["max_size_rt"].replace("(user defined)", "")),
"quantification_method_DIANN": None,
"second_pass": None,
"protein_inference": parsed_settings["fdr"]["inference_strategy"].strip(),
"predictors_library": "Built-in",
}

add_fdr_parameters(parameters, parsed_settings)
return ProteoBenchParameters(**parameters)


if __name__ == "__main__":
for fname in [
"../../../test/params/log_alphadia_1.txt",
"../../../test/params/log_alphadia_2.txt",
]:
file = pathlib.Path(fname)
params = extract_params(file)
data_dict = params.__dict__
series = pd.Series(data_dict)
series.to_csv(file.with_suffix(".csv"))
Loading
Loading