Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-89189: More compact range iterator #27986

Merged
merged 12 commits into from
Nov 30, 2022

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Aug 27, 2021

@serhiy-storchaka serhiy-storchaka marked this pull request as ready for review August 29, 2021 12:56
@ambv ambv added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 8, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ambv for commit 175b0d3 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 8, 2021
@serhiy-storchaka
Copy link
Member Author

See also #28176.

Copy link
Member

@iritkatriel iritkatriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has merge conflicts now.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@iritkatriel iritkatriel dismissed their stale review November 27, 2022 15:33

conflicts were resolved.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just a few nits.

Comment on lines +798 to +799
return Py_BuildValue("N(N)O", _PyEval_GetBuiltin(&_Py_ID(iter)),
range, Py_None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we care, but we might -- am I right that this writes pickles that can't be read by 3.11 or before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, absolutely no. I would never propose such change.

Internals will be different, but the unpickled iterator wil produce the same values.

PyObject *new_len = PyNumber_Subtract(r->len, state);
if (new_len == NULL)
return NULL;
Py_SETREF(r->len, new_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little scary -- if the new length is set but computing the new start fails, and the original iterator is still accessible, it will be truncated at the end rather than at the start. Let's update both fields after all errors have been checked.

Ditto in next().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Although __setstate__ is only supposed to be called for just created object when unpickle or make a deep copy, so there are no references to that iterator yet.

But there were other issue here. state could be a borrowed reference to r->len. It is used in two places: with PyNumber_Subtract() and with PyNumber_Multiply(). The second time it is used after setting the new r->state, when it can refer to a freed memory.

next() does not have such issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that it never happens in normal code. You have to call __setstate__() manually with the out-of-range value.

@AlexWaygood AlexWaygood changed the title bpo-45026: More compact range iterator gh-89189: More compact range iterator Nov 30, 2022
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may or may not want to change one thing, but if you do, no need to ask for a new review.

Comment on lines +986 to +989
PyObject *tmp = r->start;
r->start = new_start;
Py_SETREF(r->len, new_len);
Py_DECREF(tmp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a scenario where we can't just use

Py_SETREF(r->len, new_len);
Py_SETREF(r->start, new_start);

(which would be a little easier to follow).

Both new_len and new_start are definitely new references we own, so even if state was a borrowed reference to r->len everything would be all right.

I'll leave it up to you though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Py_SETREF can only help when you assign a single object attribute, or if the attributes are independent. Two sequential Py_SETREFs can be a sign of a bug.

We do not check the type of state. It can be an arbitrary Python object with methods __lt__, __gt__, __mul__ and __rsub__. Therefore we do not control the types of new_len and new_start. Therefore we do not control the types of r->len and r->start. They can have __del__ methods which release the GIL after assignment in Py_SETREF. When the GIL is released, the iterator object can be used in other thread when it is in inconsistent state -- new r->len and old r->start.

It never occurs in normal code (state is always an exact int), but if you try hard, you perhaps can reproduce this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, that's a scenario I hadn't considered. Maybe the next time I am asked in some kind of Q&A or interview "do you have any regrets" I should mention __del__ methods.

@serhiy-storchaka serhiy-storchaka merged commit 7877642 into python:main Nov 30, 2022
@serhiy-storchaka serhiy-storchaka deleted the range-iter branch November 30, 2022 21:04
carljm added a commit to carljm/cpython that referenced this pull request Dec 1, 2022
* main: (112 commits)
  pythongh-99894: Ensure the local names don't collide with the test file in traceback suggestion error checking (python#99895)
  pythongh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (pythonGH-99613)
  Doc: Add summary line to isolation_level & autocommit sqlite3.connect params (python#99917)
  pythonGH-98906 ```re``` module: ```search() vs. match()``` section should mention ```fullmatch()``` (pythonGH-98916)
  pythongh-89189: More compact range iterator (pythonGH-27986)
  bpo-47220: Document the optional callback parameter of weakref.WeakMethod (pythonGH-25491)
  pythonGH-99905: Fix output of misses in summarize_stats.py execution counts (pythonGH-99906)
  pythongh-99845: PEP 670: Convert PyObject macros to functions (python#99850)
  pythongh-99845: Use size_t type in __sizeof__() methods (python#99846)
  pythonGH-99877)
  Fix typo in exception message in `multiprocessing.pool` (python#99900)
  pythongh-87092: move all localsplus preparation into separate function called from assembler stage (pythonGH-99869)
  pythongh-99891: Fix infinite recursion in the tokenizer when showing warnings (pythonGH-99893)
  pythongh-99824: Document that sqlite3.connect implicitly open a transaction if autocommit=False (python#99825)
  pythonGH-81057: remove static state from suggestions.c (python#99411)
  Improve zip64 limit error message (python#95892)
  pythongh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache (python#98591)
  pythongh-99127: Allow some features of syslog to the main interpreter only (pythongh-99128)
  pythongh-82836: fix private network check (python#97733)
  Docs: improve accuracy of socketserver reference (python#24767)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants