Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99540: Constant hash for _PyNone_Type to aid reproducibility #99541

Merged
merged 2 commits into from
Dec 16, 2022

Conversation

yonillasky
Copy link
Contributor

@yonillasky yonillasky commented Nov 16, 2022

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@cpython-cla-bot
Copy link

cpython-cla-bot bot commented Nov 16, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@rhettinger rhettinger closed this Nov 19, 2022
@ethanfurman ethanfurman changed the title gh-99540: Constant hash for _PyNone_Type gh-99540: Constant hash for _PyNone_Type to aid reproducibility Dec 10, 2022
@ethanfurman
Copy link
Member

None, like True and False, is a special object in Python, and should have either a constant hash, or one that can be set by specifying the random hash seed.

@ethanfurman ethanfurman reopened this Dec 10, 2022
Objects/object.c Outdated Show resolved Hide resolved
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@yonillasky
Copy link
Contributor Author

yonillasky commented Dec 10, 2022

There is some value in making it dependent on the hash secret (PYTHONHASHSEED):

  • for intentional fuzzing purposes
  • it addresses these nebulous concerns I've seen on the discuss forum, that people might start unwittingly depending on the specific constant value of this hash

The cost of doing that is an extra global variable and a single calculation done once upon setting the hash secret. I think that is a reasonable cost - I'll try writing it that way.

EDIT: now that I did, having second thoughts - it's a larger change to CPython, and perhaps people who run their code only on platforms with ASLR disabled today will be surprised by the "non-determinism by default" we'll be introducing here.

The hash secret was introduced to negate chosen input complexity attacks on builtin hash sets/maps, which is not relevant here, as None is a singleton. The "fuzzing" use case is secondary.

I'm willing to go with either solution. Personally, I will benefit more from the PYTHONHASHSEED dependent hash, but I don't know that it's the right thing to do for general use.

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@netlify
Copy link

netlify bot commented Dec 10, 2022

Deploy Preview for python-cpython-preview canceled.

Name Link
🔨 Latest commit a1cbd39
🔍 Latest deploy log https://app.netlify.com/sites/python-cpython-preview/deploys/6394e55392d2760008ffb6e2

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

Python/bootstrap_hash.c Outdated Show resolved Hide resolved
@yonillasky
Copy link
Contributor Author

I'm waiting with the blurb update until we make a final decision whether to go with a hash-secret-dependent calculation or not.

@yonillasky
Copy link
Contributor Author

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@ethanfurman: please review the changes made to this pull request.

Copy link
Member

@ethanfurman ethanfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do a simple constant hash for None.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@ethanfurman
Copy link
Member

Looks good. We'll wait a week to see if anyone else wants to comment.

Copy link
Member

@methane methane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How 0xFCA86420 is chosen?
Is it better than random value?

@ethanfurman
Copy link
Member

The constant was chosen basically at whim (that particular value amused me); it's easier to implement as a constant because if we went with a random number we would have to make it contingent on the hash seed used to randomize str and byte hashes so that different runs of a program could be consistent (a virtual necessity for bug-hunting).

If it was a truly random value (which it currently is if Python was compiled with ASLR), then program results can vary from one run to the next with no way to force them to be the same (since None's hash is not currently dependent on the PYTHONHASHSEED).

@methane
Copy link
Member

methane commented Dec 12, 2022

it's easier to implement as a constant because if we went with a random number we would have to make it contingent on the hash seed used to randomize str and byte hashes so that different runs of a program could be consistent (a virtual necessity for bug-hunting).

I just meant constant random number, like 0xb1f921305a6be69e.

Since hashtable (dict and set) uses lower bits first, filling some lower bits can reduce hash collision against small constant ints.

But this is not a big issue because None is just one singleton.

@yonillasky
Copy link
Contributor Author

yonillasky commented Dec 12, 2022

I just meant constant random number, like 0xb1f921305a6be69e.

Just noting that if a 64-bit literal is used, we'd presumably need to add a static check for 32-bit builds, and add a 32-bit "fallback" value for that case.

@ethanfurman
Copy link
Member

Good point. Let's just use a 32-bit number instead.

@ethanfurman
Copy link
Member

Actually, unless I'm missing something, that is a 32-bit number.

@ericvsmith
Copy link
Member

I think that was in response to Inada-san's suggestion of 0xb1f921305a6be69e.

@ethanfurman
Copy link
Member

Ah, that's what I was missing. I wish we had a reply directly to a comment ability. :-(

@ethanfurman ethanfurman merged commit 432117c into python:main Dec 16, 2022
carljm added a commit to carljm/cpython that referenced this pull request Dec 16, 2022
* main:
  pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541)
  pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168)
  pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302)
  "Compound statement" docs: Fix with-statement step indexing (python#100286)
  pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278)
shihai1991 added a commit to shihai1991/cpython that referenced this pull request Dec 18, 2022
* origin/main: (1306 commits)
  Correct CVE-2020-10735 documentation (python#100306)
  pythongh-100272: Fix JSON serialization of OrderedDict (pythonGH-100273)
  pythongh-93649: Split tracemalloc tests from _testcapimodule.c (python#99551)
  Docs: Use `PY_VERSION_HEX` for version comparison (python#100179)
  pythongh-97909: Fix markup for `PyMethodDef` members (python#100089)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythonGH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (python#100205)
  pythongh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (pythonGH-94469)
  pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541)
  pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168)
  pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302)
  "Compound statement" docs: Fix with-statement step indexing (python#100286)
  pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278)
  Improve stats presentation for calls. (pythonGH-100274)
  Better stats for `LOAD_ATTR` and `STORE_ATTR` (pythonGH-100295)
  pythongh-81057: Move the Cached Parser Dummy Name to _PyRuntimeState (python#100277)
  Document that zipfile's pwd parameter is a `bytes` object (python#100209)
  pythongh-99767: mark `PyTypeObject.tp_watched` as internal use only in table (python#100271)
  Fix typo in introduction.rst (python#100266)
  ...
@Lucas-C
Copy link
Contributor

Lucas-C commented Jan 31, 2023

Thank you for this, it allowed to make my project deterministic!

I dropped a comment about it there: https://stackoverflow.com/questions/7681786/how-is-hashnone-calculated/75296606#75296606

d-torrance added a commit to d-torrance/M2 that referenced this pull request Jun 26, 2023
This way, we don't need the "getPythonNone" function, which was only
called once, to create the object.  Beginning with Python 3.12, None
will have a constant hash anyway.  We use this new value.  See
python/cpython#99541.
d-torrance added a commit to d-torrance/M2 that referenced this pull request Aug 29, 2023
This way, we don't need the "getPythonNone" function, which was only
called once, to create the object.  Beginning with Python 3.12, None
will have a constant hash anyway.  We use this new value.  See
python/cpython#99541.
d-torrance added a commit to d-torrance/M2 that referenced this pull request Aug 30, 2023
This way, we don't need the "getPythonNone" function, which was only
called once, to create the object.  Beginning with Python 3.12, None
will have a constant hash anyway.  We use this new value.  See
python/cpython#99541.
mikestillman pushed a commit to Macaulay2/M2 that referenced this pull request Sep 13, 2023
This way, we don't need the "getPythonNone" function, which was only
called once, to create the object.  Beginning with Python 3.12, None
will have a constant hash anyway.  We use this new value.  See
python/cpython#99541.
@gsakkis
Copy link

gsakkis commented Mar 1, 2024

Is this merged into 3.12? There's no mention in the changelog.

@yonillasky yonillasky deleted the none-constant-hash branch March 1, 2024 23:47
@yonillasky
Copy link
Contributor Author

yonillasky commented Mar 1, 2024

Is this merged into 3.12? There's no mention in the changelog.

The PR did update the blurb, and it appears in git log when I check out branch 3.12.

What makes a given PR worthy of appearing in the changelog?
This one is not part of any PEP, nor was it planned work, or something that affects any language requirements. There is no contractual obligation that the hash will stay constant in the future, or that sets will continue to have iteration stability if used with deterministic hashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants