Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different MaxQuant versions identify vastly different number of Protein groups via MS/MS in the same sample #110

Open
Johanhw-ms opened this issue Aug 16, 2024 · 9 comments
Assignees
Labels

Comments

@Johanhw-ms
Copy link

I am new to analyzing mass spectrometry data, so for my first experiment (an IP experiment with trypsin digestion), we let our mass spec core facility (the experts) run MaxQuant on the raw data for us. We used match between runs + LFQ.
The core facility uses MaxQuant version 2.4.7.0, and this version identified about 3000+ protein groups in my sample, of which more than 1000 were identified via MS/MS in each sample. The enriched proteins identified in the Perseus volcano plot also "made sense" in the way that the antigen I was trying to precipitate was enriched.

I wanted to recreate this result myself, so I downloaded the newest version of MaxQuant, and ran the same raw data using the same MaxQuant parameters. This time, only about half of the protein groups were identified!
When I filtered for protein groups identified via MS/MS, some of the samples had very few hits (under 100). Ihe antigen I was enriching had also disappeared from the list of identified proteins.

This led to a lot of troubleshooting:

  1. MaxQuant version 2.4.7.0 (which runs on .NET core 3.1 and .NET framework 4.7.2) gives us the results we expected (3000+ identified protein groups, 900+ protein groups identified via MS/MS per sample)
  2. MaxQuant version 2.5.0.0 (which runs on the newer .NET 7.0) gives us half as many identified protein groups, and one sample has as little as 23 protein groups identified via MS/MS!
  3. MaxQuant version 2.6.3.0 (the newest version running on .NET 8.0) looks like the one from v2.5.0.0
  4. We ran the same raw data on PEAKS, and their results look a lot more like the ones from MaxQuant v.2.4.7.0
  5. We took the mqpar file from v2.5.0.0 and ran it using v.2.4.7.0, just to be 100% sure it wasn't a change in default parameters or something. The results from this run look just like the last time we ran 2.4.7.0.

OS: Windows 10 (we also tried Linux - Ubuntu 22.04)

To us, it seems like MaxQuant version 2.5.0.0 and newer somehow analyses our data differently, and this leads to us losing ca. half of the identifiable protein groups.
Does anyone have any idea what could cause this? Is it .NET?
I would have assumed that if there was a problem with the dependencies then MaxQuant would just crash, not produce totally different results.. We are at our wits end I'm afraid.

Best regards,
Johanna

@WalterViegener
Copy link
Collaborator

WalterViegener commented Aug 21, 2024

Dear Johanna,
Could you please share your data with us at this link? It will be handled confidentially. If possible please also upload all mqpar files of the mentioned versions.

Also please let us know if it is DDA/DIA and what other settings might be important.

Best,
Walter

@Johanhw-ms
Copy link
Author

Johanhw-ms commented Aug 22, 2024 via email

@WalterViegener
Copy link
Collaborator

Dear Johanna,
Thanks for providing your data. Please upload again at this folder.
Also, based on you current upload I think each folder had the ending .d?
If so, please create a .zip of each .d folder and upload that. Also I would appreciate the mqpar files and the mentioned fasta files you used for your tests.

@WalterViegener WalterViegener self-assigned this Aug 22, 2024
@Johanhw-ms
Copy link
Author

Johanhw-ms commented Aug 22, 2024 via email

@Johanhw-ms
Copy link
Author

Johanhw-ms commented Sep 4, 2024 via email

@JinqiuXiao
Copy link

JinqiuXiao commented Sep 5, 2024

Hi Johanna,

This is Jinqiu, I'm Walter's colleague. He's still on holiday and I took over this issue. We didn't change much for DDA analysis for the past releases, so I tried some parameters and found out what's going on there. In MaxQuant there's a parameter called "Split protein groups by taxonomy ID" (as shown in the screenshot), you can find it in "Global parameters->Identification". In multi species dataset, if you take proteomes from uniprot, the taxonomy ID will be automatically assigned to each species, then during the identification, the protein from different species will not be combined into protein groups.

image

For 2.4.7 it was off by default, but we switch it on for the NET7 and NET8 version. Your dataset was kind of special, since you have two database from homo sampiens, and one of them is not from uniprot, so the taxonomy id was not recognized properly, it made some confusion during the identification and lead to fewer identifications. Your dataset only has sequence from homo sanpiens so you can switch it off. I tried to run latest MQ version with default settings, and like you mentioned, I only identified around 1500 protein groups, but by switch this parameter off I identified 3300 protein groups. You can try to switch it off and run it again, please let us know how it works :)

We'll look into this issue afterwards, thank you for letting us know about it!

Best,
Jinqiu

@Johanhw-ms
Copy link
Author

Johanhw-ms commented Sep 9, 2024 via email

@Johanhw-ms
Copy link
Author

Johanhw-ms commented Sep 10, 2024 via email

@JinqiuXiao
Copy link

Hi Johanna,

Sorry for the late reply, we were busy with the MaxQuant summer school lasst week. I did the test with your dataset with the MQ 2.6.3 (2.6.4 was not released by that time). I unchecked the split by taxonomy and get around 3300 protein groups, and I checked split by taxonomy and that was the only change I made, and I identified 1500 proteins.

You can see my results and corresponding mqpar files in this link. The password is the full name of your raw file "gC*****_3290". You can compare two mqpar files, the only difference is True or False for "splitProteinGroupsByTaxonomy".

One thing you can try is before you rerun the test, delete all intermediate files (folders that have same names as your raw files that include n0, p0 folders, and also the .index files), which were created during database search. Because when you start the db search, MQ will look for these intermediate files and will skip some steps to save time. So when you rerun the db search it's always good to remove them.

Best,
Jinqiu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants