-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add docs TypicalLogitsWarper #25140
add docs TypicalLogitsWarper #25140
Conversation
@gante let me know the changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for opening the PR 🙌
A few nits to improve information density in the docs, and to improve the example
You also need to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the note in the example -- and don't forget to run make fixup
on your end before the commit, otherwise our tests will block your PR! 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for iterating :)
You still need to run make fixup
and commit the resulting changes, CI is complaining about formatting
|
||
>>> # Set up the warper with desired parameters | ||
>>> warper = TypicalLogitsWarper(tikohn_n=3, pi=0.95) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> # Set up the warper with desired parameters | |
>>> warper = TypicalLogitsWarper(tikohn_n=3, pi=0.95) |
(this is not needed, since we set typical_p
in generate
)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! 🤗
Just some nits on the formatting
The proportion of probability mass to retain while warping the logits. The value should be between 0 and 1. | ||
Higher values (close to 1.0) retain more probability mass, leading to more typical sampling, whereas lower | ||
values (close to 0.0) retain less probability mass, leading to more diverse sampling. The default is 0.9. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We don't use the extra indent elsewhere for continued paragraphs
The proportion of probability mass to retain while warping the logits. The value should be between 0 and 1. | |
Higher values (close to 1.0) retain more probability mass, leading to more typical sampling, whereas lower | |
values (close to 0.0) retain less probability mass, leading to more diverse sampling. The default is 0.9. | |
The proportion of probability mass to retain while warping the logits. The value should be between 0 and 1. | |
Higher values (close to 1.0) retain more probability mass, leading to more typical sampling, whereas lower | |
values (close to 0.0) retain less probability mass, leading to more diverse sampling. |
The value used to filter out logits that fall below this threshold. Any logits less than this value will be | ||
set to -infinity before applying the softmax function. This helps in excluding unlikely tokens during sampling. | ||
Default is -infinity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same as above re indent. We don't need to say the default in the type info as , defaults to ...
above
The value used to filter out logits that fall below this threshold. Any logits less than this value will be | |
set to -infinity before applying the softmax function. This helps in excluding unlikely tokens during sampling. | |
Default is -infinity. | |
The value used to filter out logits that fall below this threshold. Any logits less than this value will be | |
set to -infinity before applying the softmax function. This helps in excluding unlikely tokens during sampling. |
min_tokens_to_keep (`int`, *optional*, defaults to 1): | ||
Minimum number of tokens that cannot be filtered. | ||
The minimum number of tokens to always keep during sampling. The default is 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimum number of tokens to always keep during sampling. The default is 1. | |
The minimum number of tokens to always keep during sampling. |
>>> # Generate text using the model and warper | ||
>>> typical_p = 0.9 | ||
>>> output = model.generate(input_ids, do_sample=True, max_length=50, typical_p=typical_p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice (but not necessary) to have two examples here with high and low typical_p values to demonstrate its effects
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
Added some doc string to TypicalLogitsWarper with some examples as well.
Related to #24783
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.