-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduction of the results in the paper #1
Comments
Thank you for your interest! How big are the differences you are seeing? Assuming they are not that large, it is probably due to using slightly different GPUs / drivers. Note that GPU computations generally do not give exactly the same results, especially for different models / drivers, due to accumulations and rounding happening in slightly different orders. Since GPTQ accumulates the results of a very large number of GPU operations in multiple places, these very small differences add up and can lead to slightly different final results. Comparing for example the results on some OPT/4-bit models for PTB between an A100, an A6000 and a 3090, I get:
The A100 results are precisely the numbers reported in the paper, whereas the other GPUs produce slightly different results, especially at small models, with the gaps shrinking as the model size increases. |
Thank you @efrantar! The set of GPUs I have access to does not have overlap with yours though. Here are my results comparable to yours above:
OPT/4-bit models for PTB:
Further questions:
|
@efrantar
Following the instructions in
README.md
, baseline and RTN perplexities match exactly as listed in Tables 2-3 in the paper.However, GPTQ perplexity does not.
Is this due to differences in the calibration sample? Or is the result in the Tables statistics out of multiple runs with different random seeds?
Could you share the command that reproduces the results in the paper?
Much appreciated!
The text was updated successfully, but these errors were encountered: