Skip to content

Commit

Permalink
Merge pull request #201 from wilhelm-lab/release/0.6.1
Browse files Browse the repository at this point in the history
Release/0.6.1
  • Loading branch information
picciama authored Mar 10, 2024
2 parents a8dcd10 + be2cf27 commit 6623d11
Show file tree
Hide file tree
Showing 20 changed files with 1,027 additions and 58 deletions.
2 changes: 1 addition & 1 deletion .cookietemple.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@ full_name: Victor Giurcoiu
email: victor.giurcoiu@tum.de
project_name: oktoberfest
project_short_description: Public repo oktoberfest
version: 0.6.0
version: 0.6.1
license: MIT
4 changes: 2 additions & 2 deletions .github/release-drafter.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name-template: "0.6.0 🌈" # <<COOKIETEMPLE_FORCE_BUMP>>
tag-template: 0.6.0 # <<COOKIETEMPLE_FORCE_BUMP>>
name-template: "0.6.1 🌈" # <<COOKIETEMPLE_FORCE_BUMP>>
tag-template: 0.6.1 # <<COOKIETEMPLE_FORCE_BUMP>>
exclude-labels:
- "skip-changelog"

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ hash.file
# output files in tutorials folder
tutorials/
!tutorials/Oktoberfest Tutorial.ipynb
!tutorials/Oktoberfest_workshop.ipynb

# example data
data/
2 changes: 1 addition & 1 deletion cookietemple.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.6.0
current_version = 0.6.1

[bumpversion_files_whitelisted]
init_file = oktoberfest/__init__.py
Expand Down
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,9 @@
# the built documents.
#
# The short X.Y version.
version = "0.6.0"
version = "0.6.1"
# The full version, including alpha/beta/rc tags.
release = "0.6.0"
release = "0.6.1"

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
2 changes: 1 addition & 1 deletion docs/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Always required
+----------------------------+--------------------------------------------------------------------------------------------------------------------+
| irt | Name of the model used for indexed retention time prediction |
+----------------------------+--------------------------------------------------------------------------------------------------------------------+
| prediction_server | Server and port for obtaining peptide property predictions; default: "koina.proteomicsdb.org:443" |
| prediction_server | Server and port for obtaining peptide property predictions; default: "koina.wilhelmlab.org:443" |
+----------------------------+--------------------------------------------------------------------------------------------------------------------+
| ssl | Use ssl when making requests to the prediction server, can be true or false; default = true |
+----------------------------+--------------------------------------------------------------------------------------------------------------------+
Expand Down
6 changes: 3 additions & 3 deletions docs/jobs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Example config file:
"intensity": "Prosit_2020_intensity_HCD",
"irt": "Prosit_2019_irt"
},
"prediction_server": "koina.proteomicsdb.org:443",
"prediction_server": "koina.wilhelmlab.org:443",
"numThreads": 1,
"regressionMethod": "spline",
"ssl": true,
Expand Down Expand Up @@ -138,7 +138,7 @@ Example config file:
"specialAas": "KR",
"db": "concat"
},
"prediction_server": "koina.proteomicsdb.org:443",
"prediction_server": "koina.wilhelmlab.org:443",
"numThreads": 1,
"ssl": true
}
Expand Down Expand Up @@ -190,7 +190,7 @@ Example config file:
"intensity": "Prosit_2020_intensity_HCD",
"irt": "Prosit_2019_irt"
},
"prediction_server": "koina.proteomicsdb.org:443",
"prediction_server": "koina.wilhelmlab.org:443",
"numThreads": 1,
"fdr_estimation_method": "mokapot",
"allFeatures": false,
Expand Down
20 changes: 10 additions & 10 deletions docs/predictions.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
Retrieving predictions
======================

Oktoberfest relies on retrieving predictions from a `Koina <https://koina.proteomicsdb.org/>`_ server that hosts specific models for peptide property prediction. Users can use any publicly available community server or host their own server.
Oktoberfest relies on retrieving predictions from a `Koina <https://koina.wilhelmlab.org/>`_ server that hosts specific models for peptide property prediction. Users can use any publicly available community server or host their own server.

Connecting to a community server
--------------------------------

Our publicly available community server is available at `koina.proteomicsdb.org:443`.
Our publicly available community server is available at `koina.wilhelmlab.org:443`.
If you want to connect to it, you need to have the following flags in your config file (default settings):

.. code-block:: json
{
"prediction_server": "koina.proteomicsdb.org:443",
"prediction_server": "koina.wilhelmlab.org:443",
"ssl": true,
}
Expand All @@ -31,13 +31,13 @@ This is the list of currently supported and tested models for Oktoberfest provid
+==================================================================================================================+==============================================================================================================================================================================================+
| Prosit_2019_intensity | Developed for HCD tryptic peptides only. We recommend using the Prosit_2020_intensity_HCD model instead, since it showed slightly superior performance on tryptic peptides as well. |
+------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `Prosit_2020_intensity_HCD <https://koina.proteomicsdb.org/docs#post-/Prosit_2020_intensity_HCD/infer>`_ | Developed for HCD tryptic and non-tryptic peptides. Supported modifications are oxidation and carbamidomethylation. Latest version we recommend to use for HCD. |
| `Prosit_2020_intensity_HCD <https://koina.wilhelmlab.org/docs#post-/Prosit_2020_intensity_HCD/infer>`_ | Developed for HCD tryptic and non-tryptic peptides. Supported modifications are oxidation and carbamidomethylation. Latest version we recommend to use for HCD. |
+------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `Prosit_2020_intensity_CID <https://koina.proteomicsdb.org/docs#post-/Prosit_2020_intensity_CID/infer>`_ | Developed for CID tryptic and non-tryptic peptides. Supported modifications are oxidation and carbamidomethylation. Latest version we recommend to use for CID. |
| `Prosit_2020_intensity_CID <https://koina.wilhelmlab.org/docs#post-/Prosit_2020_intensity_CID/infer>`_ | Developed for CID tryptic and non-tryptic peptides. Supported modifications are oxidation and carbamidomethylation. Latest version we recommend to use for CID. |
+------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `Prosit_2020_intensity_TMT <https://koina.proteomicsdb.org/docs#post-/Prosit_2020_intensity_TMT/infer>`_ | Developed for HCD and CID, tryptic and non-tryptic peptides. Latest version we commend for TMT labeled peptides in general. |
| `Prosit_2020_intensity_TMT <https://koina.wilhelmlab.org/docs#post-/Prosit_2020_intensity_TMT/infer>`_ | Developed for HCD and CID, tryptic and non-tryptic peptides. Latest version we commend for TMT labeled peptides in general. |
+------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `Prosit_2023_intensity_timsTOF <https://koina.proteomicsdb.org/docs#post-/Prosit_2023_intensity_timsTOF/infer>`_ | Developed for timsTOF, tryptic and non-tryptic peptides. Latest version we commend to use for timsTOF. |
| `Prosit_2023_intensity_timsTOF <https://koina.wilhelmlab.org/docs#post-/Prosit_2023_intensity_timsTOF/infer>`_ | Developed for timsTOF, tryptic and non-tryptic peptides. Latest version we commend to use for timsTOF. |
+------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

.. table::
Expand All @@ -46,14 +46,14 @@ This is the list of currently supported and tested models for Oktoberfest provid
+-----------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| iRT models | Description |
+===============================================================================================+===========================================================================================================================+
| `Prosit_2019_irt <https://koina.proteomicsdb.org/docs#post-/Prosit_2019_irt/infer>`_ | While developed for tryptic peptides only, we did not observe a drop in prediction performance for non-tryptic peptides. |
| `Prosit_2019_irt <https://koina.wilhelmlab.org/docs#post-/Prosit_2019_irt/infer>`_ | While developed for tryptic peptides only, we did not observe a drop in prediction performance for non-tryptic peptides. |
+-----------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| `Prosit_2020_irt_TMT <https://koina.proteomicsdb.org/docs/#post-/Prosit_2020_irt_TMT/infer>`_ | Developed for TMT labeled peptides. |
| `Prosit_2020_irt_TMT <https://koina.wilhelmlab.org/docs/#post-/Prosit_2020_irt_TMT/infer>`_ | Developed for TMT labeled peptides. |
+-----------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+

Once support for additional models is implemented in Oktoberfest, they will be added here.

Hosting and adding your own models
----------------------------------

In case you are planning to host your own private or public instance of Koina or want us to host your model, please refer to the official `Koina documentation <https://koina.proteomicsdb.org/docs#overview>`_.
In case you are planning to host your own private or public instance of Koina or want us to host your model, please refer to the official `Koina documentation <https://koina.wilhelmlab.org/docs#overview>`_.
2 changes: 1 addition & 1 deletion oktoberfest/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
__author__ = """The Oktoberfest development team (Wilhelmlab at Technical University of Munich)"""
__copyright__ = f"Copyright {datetime.now():%Y}, Wilhelmlab at Technical University of Munich"
__license__ = "MIT"
__version__ = "0.6.0"
__version__ = "0.6.1"

import logging.handlers
import sys
Expand Down
6 changes: 6 additions & 0 deletions oktoberfest/data/spectra.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,12 @@ def from_hdf5(cls: Type[SpectraT], input_file: Union[str, Path]) -> SpectraT:
sparse_raw_intensities = hdf5.read_file(input_file, f"sparse_{hdf5.INTENSITY_RAW_KEY}")
if not sparse_raw_intensities.empty:
spectra.add_matrix_from_hdf5(sparse_raw_intensities, FragmentType.RAW)
try:
sparse_pred_intensities = hdf5.read_file(input_file, f"sparse_{hdf5.INTENSITY_PRED_KEY}")
if not sparse_pred_intensities.empty:
spectra.add_matrix_from_hdf5(sparse_pred_intensities, FragmentType.PRED)
except Exception as e:
logger.warning(e)
sparse_raw_mzs = hdf5.read_file(input_file, f"sparse_{hdf5.MZ_RAW_KEY}")
if not sparse_raw_mzs.empty:
spectra.add_matrix_from_hdf5(sparse_raw_mzs, FragmentType.MZ)
Expand Down
17 changes: 13 additions & 4 deletions oktoberfest/plotting/plotting.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,11 +118,16 @@ def joint_plot(
height=10,
joint_kws={"rasterized": True, "edgecolor": "none", "s": 10},
)
jplot.ax_joint.axhline(y=0, c="red")
jplot.ax_joint.axvline(x=0, c="red")
jplot.ax_marg_y.axhline(y=0, c="red")
jplot.ax_marg_x.axvline(x=0, c="red")

jplot.ax_joint.set_ylabel("Score\n(peptide property prediction)")
jplot.ax_joint.set_xlabel("Score\n(search engine)")
jplot.fig.suptitle(f"Score distribution ({level.capitalize()})", y=0.99)
plt.savefig(filename, dpi=300)
plt.plot()
plt.show()
plt.close()


Expand Down Expand Up @@ -196,7 +201,7 @@ def plot_gain_loss(prosit_target: pd.DataFrame, andromeda_target: pd.DataFrame,
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
# grid
ax.set_ylabel("number of lost-common-shared targets below 1% FDR")
ax.set_ylabel(f"number of target {level.lower()}s below 1% FDR")
ax.set_axisbelow(True)
ax.yaxis.grid(color="black")
ax.tick_params(axis="y", which="major")
Expand All @@ -207,7 +212,7 @@ def plot_gain_loss(prosit_target: pd.DataFrame, andromeda_target: pd.DataFrame,
legend_label = ["Common", "Gained", "Lost"]
plt.legend(legend_label, ncol=1, bbox_to_anchor=([1.2, 0.5, 0, 0]), frameon=False)
plt.savefig(filename, dpi=300, bbox_inches="tight")
plt.plot()
plt.show()
plt.close()


Expand Down Expand Up @@ -236,7 +241,11 @@ def plot_violin_sa_ce(sa_ce_df: pd.DataFrame, filename: Union[str, Path]):
"""
fig, ax = plt.subplots(figsize=(8, 8))
sns.violinplot(data=sa_ce_df, x="COLLISION_ENERGY", y="SPECTRAL_ANGLE", ax=ax, color="#1f77b4")
ax.axvline(x=sa_ce_df["COLLISION_ENERGY"][sa_ce_df["SPECTRAL_ANGLE"].idxmax()], color="red")
ax.axvline(
x=sa_ce_df["COLLISION_ENERGY"][sa_ce_df["SPECTRAL_ANGLE"].idxmax()] - sa_ce_df["COLLISION_ENERGY"].min(),
color="red",
)
plt.xticks(rotation=90)
plt.grid()
plt.savefig(filename, dpi=300)
plt.plot()
Expand Down
6 changes: 3 additions & 3 deletions oktoberfest/predict/koina.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class Koina:
def __init__(
self,
model_name: str,
server_url: str = "koina.proteomicsdb.org:443",
server_url: str = "koina.wilhelmlab.org:443",
ssl: bool = True,
targets: Optional[List[str]] = None,
disable_progress_bar: bool = False,
Expand All @@ -49,7 +49,7 @@ def __init__(
and that the specified model is available on the server.
:param model_name: The name of the Koina model to be used for inference.
:param server_url: The URL of the inference server. Defaults to "koina.proteomicsdb.org:443".
:param server_url: The URL of the inference server. Defaults to "koina.wilhelmlab.org:443".
:param ssl: Indicates whether to use SSL for communication with the server. Defaults to True.
:param targets: An optional list of targets to predict. If this is None, all model targets are
predicted and received.
Expand Down Expand Up @@ -100,7 +100,7 @@ def _is_server_ready(self):
if not self.client.is_server_live():
raise ValueError("Server not yet started.")
except InferenceServerException as e:
if self.url == "koina.proteomicsdb.org:443":
if self.url in ["koina.wilhelmlab.org:443", "koina.proteomicsdb.org:443"]:
if self.ssl:
raise InferenceServerException(
"The public koina network seems to be inaccessible at the moment. "
Expand Down
2 changes: 1 addition & 1 deletion oktoberfest/preprocessing/preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ def filter_peptides(peptides: pd.DataFrame, min_length: int, max_length: int, ma
& (~peptides["MODIFIED_SEQUENCE"].str.contains(r"\(ac\)"))
& (~peptides["MODIFIED_SEQUENCE"].str.contains(r"\(Acetyl \(Protein N-term\)\)"))
& (~peptides["MODIFIED_SEQUENCE"].str.contains(r"\[UNIMOD\:21\]"))
& (~peptides["SEQUENCE"].str.contains("U|X"))
& (~peptides["SEQUENCE"].str.contains(r"B|\*|\.|U|X|Z"))
]


Expand Down
47 changes: 27 additions & 20 deletions oktoberfest/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,14 @@ def generate_spectral_lib(config_path: Union[str, Path]):
# Create a pool for producer processes
predictor_pool = pool.Pool(config.num_threads)

consumer_process = Process(
target=speclib.async_write,
args=(
shared_queue,
writing_progress,
),
)

try:
results = []
for i in batches:
Expand All @@ -376,13 +384,7 @@ def generate_spectral_lib(config_path: Union[str, Path]):
total=n_batches, desc="Writing library", postfix={"successful": 0, "missing": 0}
) as writer_pbar:
# Start the consumer process
consumer_process = Process(
target=speclib.async_write,
args=(
shared_queue,
writing_progress,
),
)

consumer_process.start()
with tqdm(
total=n_batches, desc="Getting predictions", postfix={"successful": 0, "failed": 0}
Expand Down Expand Up @@ -475,23 +477,28 @@ def _calculate_features(spectra_file: Path, config: Config):
if calc_feature_step.is_done():
return

predict_kwargs = {
"server_url": config.prediction_server,
"ssl": config.ssl,
}
predict_step = ProcessStep(config.output, "predict." + spectra_file.stem)
if not predict_step.is_done():

pred_intensities = pr.predict(
data=library.spectra_data,
model_name=config.models["intensity"],
**predict_kwargs,
)
predict_kwargs = {
"server_url": config.prediction_server,
"ssl": config.ssl,
}

pred_irts = pr.predict(data=library.spectra_data, model_name=config.models["irt"], **predict_kwargs)
pred_intensities = pr.predict(
data=library.spectra_data,
model_name=config.models["intensity"],
**predict_kwargs,
)

library.add_matrix(pd.Series(pred_intensities["intensities"].tolist(), name="intensities"), FragmentType.PRED)
library.add_column(pred_irts["irt"], name="PREDICTED_IRT")
pred_irts = pr.predict(data=library.spectra_data, model_name=config.models["irt"], **predict_kwargs)

library.write_pred_as_hdf5(config.output / "data" / spectra_file.with_suffix(".mzml.pred.hdf5").name).join()
library.add_matrix(pd.Series(pred_intensities["intensities"].tolist(), name="intensities"), FragmentType.PRED)
library.add_column(pred_irts["irt"], name="PREDICTED_IRT")

library.write_pred_as_hdf5(config.output / "data" / spectra_file.with_suffix(".mzml.pred.hdf5").name).join()

predict_step.mark_done()

# produce percolator tab files
fdr_dir = config.output / "results" / config.fdr_estimation_method
Expand Down
Loading

0 comments on commit 6623d11

Please sign in to comment.