Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add protein columns to alphadia and spectronaut TOMLs #445

Merged
merged 4 commits into from
Nov 24, 2024

Conversation

rodvrees
Copy link
Contributor

@rodvrees rodvrees commented Nov 21, 2024

Because some DIA search engines parse fasta headers themselves and sometimes lose species info in output, we previously implemented a mapping function to map gene names to correct fasta header info, including species info. This way we added a new protein column to the intermediate format, which negated the need to add a specified protein column in the search engine-specific TOML (because default name was used).

As per @mlocardpaulet's suggestion, we should still include protein column name information in the tomls, to avoid confusion as to which column exactly is used.
The easiest fix would be to just keep the functionality the same, but do the mapping within the original column instead of making a new "Proteins" column. In this way, the original column name can be specified in the .toml

Let me know if this approach is okay

@mlocardpaulet
Copy link
Contributor

Sounds good to me.
Thinking about it, we could also consider doing this mapping for all the pipelines (if we want to homogenise the process).
I mean...
We would take the protein column (as identified from the toml), and:

  1. check if the species are there:
    if yes -> take this information as species
    if no -> do the mapping with the mapping table (accession -> species).

I suggest that without really knowing what the code looks like so feel free to completely discard this suggestion. I think that one good thing if we do it this way is that if users set the accession parsing differently, it will still work.
Would this make sense?
If it is too much work for little reward, please do what you suggested.

@rodvrees
Copy link
Contributor Author

We could probably generalize it a bit, e.g. add an entry to the .toml that specifies if the mapper needs to be used or not. That might indeed be cleaner.

@mlocardpaulet
Copy link
Contributor

We could probably generalize it a bit, e.g. add an entry to the .toml that specifies if the mapper needs to be used or not. That might indeed be cleaner.

you don't think that this can be checked automatically? I mean... If there is no "_HUMAN" (or another species), the mapper is needed.

@RobbinBouwmeester
Copy link
Contributor

@rodvrees ready for review?

@rodvrees
Copy link
Contributor Author

Sure, and then I'll make a new issue/PR to generalize the mapping functionality

@rodvrees rodvrees marked this pull request as ready for review November 23, 2024 14:38
@RobbinBouwmeester RobbinBouwmeester merged commit 662c38b into main Nov 24, 2024
8 checks passed
@RobbinBouwmeester RobbinBouwmeester deleted the improve-toml-spectronaut branch November 24, 2024 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants