Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add footnote warning to hashing a hash #411

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -217,20 +217,29 @@ As of 3.0.0 the ``$2y$`` prefix is still supported in ``hashpw`` but deprecated.
Maximum Password Length
~~~~~~~~~~~~~~~~~~~~~~~

The bcrypt algorithm only handles passwords up to 72 characters, any characters
The bcrypt algorithm only handles passwords up to 72 characters; any characters
beyond that are ignored. To work around this, a common approach is to hash a
password with a cryptographic hash (such as ``sha256``) and then base64
encode it to prevent NULL byte problems before hashing the result with
password with a keyed cryptographic hash (such as ``bcrypt_pbkdf``) and then
base64 encode it to prevent NULL byte problems before hashing the result with
``bcrypt``:

.. code:: pycon

>>> import base64
>>> import bcrypt
>>> password = b"an incredibly long password" * 10
>>> hashed = bcrypt.hashpw(
... base64.b64encode(hashlib.sha256(password).digest()),
... base64.b64encode(bcrypt.kdf(password=password,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I meant using bcrypt.kdf directly instead of bcrypt.hashpw. This works, but it… feels unusual? If people want to keep the descriptive hash format that hashpw creates, HMAC seems simpler (although I guess the only concrete difference is that you don’t have to make an arbitrary choice for the number of output bytes and rounds).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. I just literally placed one in the other, but there's more intuitive ways of doing this.

Do you mean base64.b64encode(bcrypt.kdf(password=password, salt=bcrypt.gensalt(), <whatever>))? Would password verification simply be to repeat the procedure and compare strings, or is there an analogue to hashpw?

And with the HMAC one, do you mean bcrypt.hashpw(base64.b64encode(hmac.digest(pepper, password, "sha256")), bcrypt.gensalt())?

If so, I suppose another important difference is that the latter requires storing a pepper (or per-hash salt) separately.

Copy link
Author

@FWDekker FWDekker Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had forgotten about this PR, but it's now almost a year old. Luckily(?) the relevant section in the README has not been updated, so the changes here are still relevant.

Do you perhaps have time to take a look at my question to see if I understood your suggestion correctly?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean base64.b64encode(bcrypt.kdf(password=password, salt=bcrypt.gensalt(), <whatever>))? Would password verification simply be to repeat the procedure and compare strings, or is there an analogue to hashpw?

And with the HMAC one, do you mean bcrypt.hashpw(base64.b64encode(hmac.digest(pepper, password, "sha256")), bcrypt.gensalt())?

Yes and yes.

On second thought, though… maybe the best answer is replacing the code with your original footnote, and pointing back to “but you should really use argon2id” at the top of the README?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, good point. I've made sure to emphasise that not using bcrypt is probably the best solution. However, I'm sure there will be people who will somehow be forced to use bcrypt, either because of legacy software, weird interoperability, legal requirements, incompetent management, you name it. So I think it is still worthwhile to explain how to work around the length limitation. I've also chosen to use the HMAC variant because if a reader doesn't already know about hash shucking and peppers, I think it's unlikely they'll manage to choose reasonable numbers of rounds and bytes.

Let me know what you think of the rewritten section.

... salt=pepper,
... desired_key_bytes=32,
... rounds=100)),
... bcrypt.gensalt()
... )

Using a hash function without a hash is `recommended against`_ as it may expose
FWDekker marked this conversation as resolved.
Show resolved Hide resolved
the system to `hash shucking`_ attacks. Instead, the hash function should use a
global `pepper`_ or a per-hash salt.

Compatibility
-------------

Expand All @@ -252,3 +261,6 @@ identify a vulnerability, we ask you to contact us privately.
.. _`standard library`: https://docs.python.org/3/library/hashlib.html#hashlib.scrypt
.. _`argon2_cffi`: https://argon2-cffi.readthedocs.io
.. _`cryptography`: https://cryptography.io/en/latest/hazmat/primitives/key-derivation-functions/#cryptography.hazmat.primitives.kdf.scrypt.Scrypt
.. _`recommended against`: https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#pre-hashing-passwords
.. _`hash shucking`: https://security.stackexchange.com/a/234795/
.. _`pepper`: https://en.wikipedia.org/wiki/Pepper_(cryptography)