Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-44590: Lazily allocate frame objects #27077

Merged
merged 44 commits into from
Jul 26, 2021

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Jul 9, 2021

This PR:

  1. Moves all essential frame data to the (per-thread) stack
  2. Lazily allocates heap frame objects only when needed.

For most calls to Python functions, we can get the necessary memory for a stack frame by just bumping a pointer.
When we do need a frame object, for tracebacks and the like, it is lazily allocated.

According to my benchmarking run this PR produces a maximum speedup of 31% with a mean of 7%!

The performance numbers are suspiciously good.
If someone could run the comparison on their machine that would be much appreciated.

https://bugs.python.org/issue44590

…to tstate.pyframe as preparation for lazily created frames.
Python/ceval.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated
@@ -2009,42 +2016,57 @@ _Py_GetConfig(void)

#define MINIMUM_OVERHEAD 1000

PyObject **
_PyThreadState_PushLocals(PyThreadState *tstate, int size)
_Py_NO_INLINE PyObject **
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why _Py_NO_INLINE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the slow path. We want to make sure the fast path is inlined.

Python/pystate.c Outdated
PyObject **
_PyThreadState_PushLocals(PyThreadState *tstate, int size)
_Py_NO_INLINE PyObject **
_PyThreadState_PushChunk(PyThreadState *tstate, int size)
Copy link
Member

@pablogsal pablogsal Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to take pointers to arbitrary places in memory, doesn't the standard require that those must be aligned? How are we making sure that all pointers to this region are aligned?. From the standard:

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.

Many compilers normally assume aligned pointer access.

Copy link
Member Author

@markshannon markshannon Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think the returned memory would be unaligned?
How is this any different from malloc?

Copy link
Member

@pablogsal pablogsal Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think the returned memory would be unaligned?

My question is not if the returned memory would be unaligned, but if we are taking pointers to random places of the chunk array. For instance, if we are doing anything isomorphic to this:

        data = (char *)malloc(16);
        sp1 = (short*)(data+3);

then we are doing it wrong. I am asking not because I think we are doing it wrong, but because i am still wrapping around the structure you have in mind for the chunks, because there is 3 or 4 layers of indirections, a "chunk struct" and at least 3 pointer accesses when pushin frames, so immediately I am wondering if this mechanism is taking this correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compiled Python with the undefined behavior sanitizer and it doesn't complain, so seems that there is no problem here.

On contrast, my example shows:

example.c:1313:30: runtime error: load of misaligned address 0x55f1ba0cd283 for type 'short int', which requires 2 byte alignment
0x55f1ba0cd283: note: pointer points here
 00  3d 0e 0b e5 f4 55 00 00  00 00 00 00 00 00 00 00  09 09 49 42 4d 39 30 36  21 00 00 00 00 00 00

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand it found this (unrelated to this PR):

home/pablogsal/github/python/main/Modules/_sha3/kcp/KeccakP-1600-opt64.c:467:9: runtime error: load of misaligned address 0x563579d3c3f7 for type 'UINT64', whic
h requires 8 byte alignment
0x563579d3c3f7: note: pointer points here
 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23
             ^
/home/pablogsal/github/python/main/Modules/_sha3/kcp/KeccakP-1600-opt64.c:467:9: runtime error: load of misaligned address 0x563579d3c3ff for type 'UINT64', whic
h requires 8 byte alignment
0x563579d3c3ff: note: pointer points here
 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23
             ^
/home/pablogsal/github/python/main/Modules/_sha3/kcp/KeccakP-1600-opt64.c:467:9: runtime error: load of misaligned address 0x563579d3c407 for type 'UINT64', whic
h requires 8 byte alignment
0x563579d3c407: note: pointer points here
 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23
             ^
/home/pablogsal/github/python/main/Modules/_sha3/kcp/KeccakP-1600-opt64.c:467:9: runtime error: load of misaligned address 0x563579d3c40f for type 'UINT64', whic
h requires 8 byte alignment
0x563579d3c40f: note: pointer points here
 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23 23 23 23 23 23  23 23 23
             ^

I will open an issue

if (new == NULL) {
return NULL;
}
tstate->datastack_chunk->top = tstate->datastack_top - &tstate->datastack_chunk->data[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tstate->datastack_chunk->top = tstate->datastack_top - &tstate->datastack_chunk->data[0];
tstate->datastack_chunk->top = tstate->datastack_top - tstate->datastack_chunk->data;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing anything here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It's just a hint to the reader that data is an array within the chunk.
chunk->data looks like a pointer deference, but &chunk->data[0] hints that this is an address calculation.

Python/pystate.c Outdated Show resolved Hide resolved
@markshannon markshannon requested a review from a team as a code owner July 21, 2021 12:26
Copy link
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Great job!

Let's run the buildbots and land!

@markshannon markshannon added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 26, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @markshannon for commit e8476b2 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 26, 2021
@markshannon
Copy link
Member Author

The failures on buildbot/AMD64 Windows10 PR look like something is wrong with that machine.

@markshannon markshannon merged commit ae0a2b7 into python:main Jul 26, 2021
@ncoghlan
Copy link
Contributor

ncoghlan commented Aug 1, 2021

One consequence of this change is that the code is now quite confusing, since the term "frame" is used ambiguously to refer to both frame objects and interpreter frames (specifically, the f_frame field on frame objects is now an interpreter frame reference, while tb_frame on tracebacks is still a full frame object).

@ncoghlan
Copy link
Contributor

ncoghlan commented Aug 1, 2021

The generator code uses gi_xframe rather than gi_frame to make the distinction a bit clearer, and I think doing that systematically (so there is consistent code level distinction between execution frames and introspection frames) would make the new code much easier to follow.

@ncoghlan
Copy link
Contributor

ncoghlan commented Aug 1, 2021

I've posted a proposed internal API refactoring design here: https://bugs.python.org/issue44800

@ncoghlan
Copy link
Contributor

ncoghlan commented Aug 1, 2021

And the PR implementing that proposed refactoring is up at #27525

@markshannon markshannon deleted the lazy-frame branch September 15, 2021 11:51
@@ -3,7 +3,7 @@
#include "pycore_pymem.h" // _Py_tracemalloc_config
#include "pycore_traceback.h"
#include "pycore_hashtable.h"
#include "frameobject.h" // PyFrame_GetBack()
#include <pycore_frame.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m curious why this one is included with <> when all the headers above use ""
(I don’t know much C note!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants