bpo-43574: Dont overallocate list literals #24954

chadnetzer · 2021-03-21T03:19:56Z

Before v3.9, list literals weren't over-allocated, but this behavior regressed starting with v3.9:

Switched to initializing list literals w/ LIST_EXTEND
https://bugs.python.org/issue39320
#17984

Commit where over-allocation of list literals first appeared
https://bugs.python.org/issue38328
#17114
6dd9b64

This changes to the original behavior of not over-allocating lists for list literals, by not over-allocating any 0-length list that is extended. Technically this is a change in behavior, as lists that are initialized to length 0, and then immediately extended, will not be over-allocated on that initial extension. This change in behavior is likely innocuous, and possibly desirable, whereas the long-standing Python behavior of not using over-allocation for list literals is almost certainly desired.

Adds a test_overallocation test for lists to find regressions (assuming the regression doesn't alter the behavior of both the list-literal and list-initialization from a known length).

https://bugs.python.org/issue43574

This restores the behavior of Python versions before v3.9, where a list initialized from a list literal allocates memory for exactly as many elements as it contains (until a list append or other operation that changes the list size). The interpreter was changed in v3.9 to use the LIST_EXTEND bytecode to initialize a new list from a literal, which caused it to overallocate. By not performing this over-allocation on an empty list, we restored the original behavior (though now explicitly creating an empty list and extending it once will also not result in an over-allocation).

the-knights-who-say-ni · 2021-03-21T03:19:59Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@chadnetzer

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

pitrou

The proposed approach looks fine to me. Two things:

Can you sign the contributor's agreement?
Can you add a NEWS item? See https://devguide.python.org/committing/?highlight=blurb#updating-news-and-what-s-new-in-python

Lib/test/test_list.py

bedevere-bot · 2021-03-21T17:43:28Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Objects/listobject.c

Ensure that lists initialized from both list-literals and iterables of known length are initialized without over-allocation, by comparing them with allocation of lists initialized from an unknown size where over-allocation is expected.

Micro-optimization

chadnetzer · 2021-03-22T18:25:13Z

I have made the requested changes; please review again.

Let me know if the test_overallocation tests (Lib/test/test_list.py) seem reasonable, or overkill.

bedevere-bot · 2021-03-22T18:25:16Z

Thanks for making the requested changes!

@pitrou: please review the changes made to this pull request.

pitrou

+1, thank you

pitrou · 2021-03-22T18:45:00Z

I'll let other people review before merging this.

Misc/NEWS.d/next/Core and Builtins/2021-03-21-18-51-30.bpo-43574.mteI-I.rst

chadnetzer · 2021-03-23T23:57:06Z

Hmmm. The list_resize() function is used elsewhere in Objects/listobject.c, and in particular is called on lists (empty or otherwise) when inserting, appending, etc. So this PR will have an effect on empty lists in those areas as well, not just for list_extend(). This will essentially delay overallocation for empty lists in those situations (until after the first insert, append, etc), and possibly affect the amount of overallocation afterwards.

In the interest of fairness, I'll gather some examples to show the behavior change, at least to help decide if any of those situations will suffer a negative consequence as a result. (I'll edit this comment to with the examples, when ready)

Edit: Okay, so there is an additional complication with this change; really it's meant to target list_extend(), which is now used to initialize new lists from literals of length 2 or more, but it's implemented by changing list_resize(), which is used by more than list_extend(). And in particular, it also changes the behavior of appending or inserting to an empty list, by also not overallocating on that first append.

However, the current behavior (unpatched 3.10alpha), for an empty list, is to overallocate on the first append/insert, to allow space for 4 elements, and then overallocate again (to space for 8 elements) when the 4 elements are added. With this patch to list_resize(), however, the first append will not overallocate and so when appending the second item to that list, it will overallocate directly to 8 elements (skipping the overallocation for 4 items entirely).

Overall this is likely undesirable, and certainly not intended by this fix, it's just a side-effect of doing the work in the common helper (list_resize()) for list extend/append/insert/etc.

Here's an example of the list append/insert behavior change, to clarify:

$ ./python.exe
Python 3.10.0a6+ (remotes/gh/master:20a5b7e986, Mar 23 2021, 17:07:20) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
>>> # Here's the current insert behavior without the proposed [bpo-43574](https://bugs.python.org/issue43574) patch
>>> l = []
>>> l.__sizeof__()
40
>>> l.append("eggs")  # first append overallocates by 3 slots (ie. 4 total, 1 used)
>>> l.__sizeof__()
72
>>> l.append("ham")  # additional appends don't overallocate until 4 slots are used
>>> l.__sizeof__()
72

but with the patch, which (unintentionally) delays the over-allocation of the first append to an empty list

# Now w/ [bpo-43574](https://bugs.python.org/issue43574) patch (no overallocation for extends on empty lists)
$ ./python.exe
Python 3.10.0a6+ (heads/bpo43574-dont-overallocate-list-literals:7436223a71, Mar 23 2021, 17:40:29) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
>>> l = []
>>> l.__sizeof__()
40
>>> l.append("eggs")  # No overallocation on first append
>>> l.__sizeof__()
48
>>> l.append("ham")  # So the second append skips 4 total items, and allocates space for 8 items
>>> l.__sizeof__()
104

So, I think that warrants a delay in merging and some further consideration of how to handle this; I can think of a few ways to "fix" this, but not without some additional complication. Basically because we want to signal that initializing a list from a known number of items shouldn't overallocate, but adding to an already created list should be able to do so, which isn't quite expressible in the helper functions we have currently, imo. Or possibly we don't care about over-allocating when initializing from a known number of items (though it's a regression), but maybe we want to ensure that a known 3 or 4 item list doesn't immediately over-allocate to 8 items (before any appends, etc.) is what we want to change...

Edit 2: I think I have 1 or 2 fairly straightforward ways to adjust this PR to remain on-target for restoring list-literal overallocation behavior, w/o also inadvertently affecting list append/insert over-allocation behavior, but will have to wait until the weekend to follow up w/ the details (fyi).

The list_resize() helper is used by single-item list appends and inserts, which normally would overallocate (to a capacity of 4) on the first added element, and then again to 8 when the capacity is filled. But if this over-allocation is postponed on the first added element, the current expansion formula would then over-allocate to a capacity of 8 on the second append/insertion. List-literals of length 1 and 2 are not created with the list_extend() function (which then calls list_resize()), but instead built directly as capacity 1 or 2 lists with the BUILD_LIST opcode. By excluding the special case for list_resize() of empty lists when newsize is greater than 1, its allows list append/insert to continue to over-allocate without skipping capacity 4.

chadnetzer · 2021-04-04T05:01:52Z

I made a small but significant tweak to my original proposal, and added a lot of of commentary to the bug tracker issue. Basically, the latest tweak restores the existing list append/insert behavior for empty lists, while still reverting back to the long-standing behavior of not overallocating list-literals (which, for length 3 or more, are created by using the list_extend() helper)

pitrou

Thanks for the update. I would welcome a bit more thorough testing.

pitrou · 2021-04-04T14:08:41Z

Lib/test/test_list.py

+        for literal in test_literals:
+            # Ensure that both list literals, and lists made from an iterable
+            # of known size use the same amount of allocation. It will be
+            # verified later that no over-allocation occurs for list literals.


This comment says "verified later", but I see no specific test of no-over-allocation of list literals. There is SizeofTest in test_sys.py but it only tests two literal lengths.
I would imagine this kind of test:

prev_literal = None for literal in [[] + test_literals]: if prev_literal is not None # Allocation of literal N uses one more pointer than allocation of literal N-1 self.assertEqual(sizeof(literal), sizeof(prev_literal) + struct.calcsize('P')) prev_literal = literal

Yes, I should have said "verified below", ie. in same test, after the loop. It's an implicit test, though, since with the current over-allocation strategy, the list capacity can sometimes equal list length, so it's not universally true that each list-literal will have less memory allocated than a normal list of equivalent length. So it only can test that at least sometimes the list-literals are created with less capacity then a grown list of equivalent length.

I was avoiding a direct test of memory usage, like you propose, just to keep things general. But I can certainly add such a direct test, which (afaict) is really the only way to test that all the list-literals are being created without over-allocation.

@pitrou I ran with your idea, and added a direct test of the memory allocated for a list-literal, which also let me remove the confusing wording about verifying "later", since that's no longer the case. Thanks for the suggestion.

Confirm that list-literals aren't over-allocated by directly checking their used memory, which is the combination of the size of an empty list and the number of pointers needed for a list of exactly that length.

gpshead

Overall good, some test maintainability and debugging suggestions.

gpshead · 2021-04-11T21:46:14Z

Lib/test/test_list.py

+
+        # bpo-38373: initialized or grown lists are not always over-allocated.
+        # Confirm that over-allocation occurs at least some of the time.
+        self.assertEqual(True, any(x>0 for x in overalloc_amts))


add msg=repr(overalloc_amts) to aid debugging if this fails.

gpshead · 2021-04-11T21:47:07Z

Lib/test/test_list.py

+            # Direct check that list-literals do not over-allocate, by
+            # calculating the total size of used pointers.
+            total_ptr_size = len(literal) * struct.calcsize('P')
+            self.assertEqual(sizeof(literal), sizeof([]) + total_ptr_size)


add a msg=repr(literal) arg to all of the asserts in this loop to aid debugging directly from a test failure log.

gpshead · 2021-04-11T21:48:36Z

Lib/test/test_list.py

+                ]
+
+        overalloc_amts = []
+        for literal in test_literals:


no action required: meta: this is where I really wish we had an absltest.paramtereize or pytest.mark.parameterize available in the stdlib. it'd run all of these as unique tests with appropriate names instead of a loop that prevents others from running upon first failure.

I think unittest.subTest() displays parameters on the failing subtests and keeps running through each even on failure.

gpshead · 2021-04-11T21:49:09Z

Lib/test/test_list.py

+        # Confirm that over-allocation occurs at least some of the time.
+        self.assertEqual(True, any(x>0 for x in overalloc_amts))
+
+        # Empty lists should overallocate on initial append/insert (unlike


Turn this bit into an additional test method.

gpshead · 2021-04-11T21:49:36Z

Lib/test/test_list.py

+        # bpo-43574: Don't overallocate for list-literals
+        sizeof = sys.getsizeof
+
+        # First handle empty list and empty list-literal cases.  Should have no


make this set of initial assertions into its own test method.

gpshead · 2021-04-11T21:50:18Z

Misc/NEWS.d/next/Core and Builtins/2021-03-21-18-51-30.bpo-43574.mteI-I.rst

@@ -0,0 +1,3 @@
+``list`` objects don't overallocate when starting empty and then extended, or
+when set to be empty.  This effectively restores previous ``list`` memory


quantify what "previous" means. 3.8 and earlier I believe.

bedevere-bot · 2021-04-11T21:54:50Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

github-actions · 2021-06-03T00:22:32Z

This PR is stale because it has been open for 30 days with no activity.

ambv · 2022-05-17T16:35:08Z

This missed the boat for inclusion in Python 3.9 which accepts security fixes only as of today.

sweeneyde · 2022-05-17T16:57:53Z

Is this PR still relevant since #31816 was merged?

out of date

methane · 2022-05-18T02:55:14Z

Is this PR still relevant since #31816 was merged?

I think no. This PR and the #87740 can be closed.

chadnetzer added 3 commits March 20, 2021 20:00

Add testcase for bpo-43574 list overallocaton

4dbee9a

Skip overallocation calc when not overallocating

172f2c0

the-knights-who-say-ni added the CLA not signed label Mar 21, 2021

bedevere-bot added the awaiting review label Mar 21, 2021

pitrou requested changes Mar 21, 2021

View reviewed changes

Lib/test/test_list.py Outdated Show resolved Hide resolved

bedevere-bot added awaiting changes and removed awaiting review labels Mar 21, 2021

pitrou requested review from pablogsal and tim-one March 21, 2021 17:43

sweeneyde reviewed Mar 21, 2021

View reviewed changes

Objects/listobject.c Outdated Show resolved Hide resolved

Objects/listobject.c Outdated Show resolved Hide resolved

chadnetzer added 4 commits March 21, 2021 19:41

Fix list_extend() init list changes to PEP-7 form

d76ae2a

bpo-43574: Add NEWS

7aad246

Add more list overallocation regression tests

7c20c6c

Ensure that lists initialized from both list-literals and iterables of known length are initialized without over-allocation, by comparing them with allocation of lists initialized from an unknown size where over-allocation is expected.

Put common-case first in logical-or list_resize()

7be3ae8

Micro-optimization

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Mar 22, 2021

bedevere-bot added awaiting change review and removed awaiting changes labels Mar 22, 2021

bedevere-bot requested a review from pitrou March 22, 2021 18:25

pitrou approved these changes Mar 22, 2021

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting change review labels Mar 22, 2021

pablogsal reviewed Mar 22, 2021

View reviewed changes

Misc/NEWS.d/next/Core and Builtins/2021-03-21-18-51-30.bpo-43574.mteI-I.rst Outdated Show resolved Hide resolved

Adjust NEWS entry language based on PR feedback

7436223

chadnetzer added 2 commits April 3, 2021 19:45

Update test_overallocation of empty list append()

4336cc6

sweeneyde previously approved these changes Apr 4, 2021

View reviewed changes

pitrou reviewed Apr 4, 2021

View reviewed changes

Add direct test of list-literal non-overallocation

19d4374

Confirm that list-literals aren't over-allocated by directly checking their used memory, which is the combination of the size of an empty list and the number of pointers needed for a list of exactly that length.

gpshead added type-bug An unexpected behavior, bug, or error performance Performance or resource usage needs backport to 3.9 only security fixes labels Apr 11, 2021

gpshead requested changes Apr 11, 2021

View reviewed changes

bedevere-bot removed the awaiting merge label Apr 11, 2021

bedevere-bot added the awaiting changes label Apr 11, 2021

github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 3, 2021

gpshead added the needs backport to 3.10 only security fixes label Jul 6, 2021

gpshead self-assigned this Jul 6, 2021

gpshead removed their assignment Apr 20, 2022

ambv removed the needs backport to 3.9 only security fixes label May 17, 2022

chadnetzer mannequin mentioned this pull request May 17, 2022

Regression in overallocation for literal list initialization in v3.9+ #87740

Closed

methane closed this May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-43574: Dont overallocate list literals #24954

bpo-43574: Dont overallocate list literals #24954

chadnetzer commented Mar 21, 2021 •

edited by bedevere-bot

Loading

the-knights-who-say-ni commented Mar 21, 2021

pitrou left a comment

bedevere-bot commented Mar 21, 2021

chadnetzer commented Mar 22, 2021

bedevere-bot commented Mar 22, 2021

pitrou left a comment

pitrou commented Mar 22, 2021

chadnetzer commented Mar 23, 2021 •

edited

Loading

chadnetzer commented Apr 4, 2021

pitrou left a comment

pitrou Apr 4, 2021

chadnetzer Apr 4, 2021

chadnetzer Apr 5, 2021

gpshead left a comment

gpshead Apr 11, 2021

gpshead Apr 11, 2021

gpshead Apr 11, 2021

sweeneyde Apr 11, 2021

gpshead Apr 11, 2021

gpshead Apr 11, 2021

gpshead Apr 11, 2021

bedevere-bot commented Apr 11, 2021

github-actions bot commented Jun 3, 2021

ambv commented May 17, 2022

sweeneyde commented May 17, 2022

methane commented May 18, 2022

		@@ -0,0 +1,3 @@
		``list`` objects don't overallocate when starting empty and then extended, or
		when set to be empty. This effectively restores previous ``list`` memory

bpo-43574: Dont overallocate list literals #24954

bpo-43574: Dont overallocate list literals #24954

Conversation

chadnetzer commented Mar 21, 2021 • edited by bedevere-bot Loading

the-knights-who-say-ni commented Mar 21, 2021

CLA Missing

pitrou left a comment

Choose a reason for hiding this comment

bedevere-bot commented Mar 21, 2021

chadnetzer commented Mar 22, 2021

bedevere-bot commented Mar 22, 2021

pitrou left a comment

Choose a reason for hiding this comment

pitrou commented Mar 22, 2021

chadnetzer commented Mar 23, 2021 • edited Loading

chadnetzer commented Apr 4, 2021

pitrou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gpshead left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bedevere-bot commented Apr 11, 2021

github-actions bot commented Jun 3, 2021

ambv commented May 17, 2022

sweeneyde commented May 17, 2022

methane commented May 18, 2022

chadnetzer commented Mar 21, 2021 •

edited by bedevere-bot

Loading

chadnetzer commented Mar 23, 2021 •

edited

Loading