Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-86682: Adds sys._getframemodulename as a lightweight alternative to _getframe #99520

Merged
merged 15 commits into from
Jan 13, 2023

Conversation

zooba
Copy link
Member

@zooba zooba commented Nov 15, 2022

Replaces _getframe calls in collections, doctest, enum, and typing modules

…ame calls in collections, doctest, enum, and typing modules
Python/sysmodule.c Outdated Show resolved Hide resolved
@zooba zooba changed the title gh-86682: Adds sys._get_calling_module_name gh-86682: Adds sys._getcaller Nov 16, 2022
@zooba zooba changed the title gh-86682: Adds sys._getcaller gh-86682: Adds sys._getcaller as a lightweight alternative to _getframe Nov 16, 2022
Python/sysmodule.c Outdated Show resolved Hide resolved
Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typing change is fine, thanks for picking this up!

I might like your original version (sys._get_calling_module_name) better. It exposes less data, so it's easier to keep the interface if interpreter internals change.

@AlexWaygood AlexWaygood removed their request for review November 16, 2022 14:55
@zooba
Copy link
Member Author

zooba commented Nov 16, 2022

Yeah, turns out we can't use __module__ reliably here, so I'm inclined to change it back and make it only return the module name (rather than encoding use of __globals__ on the caller's side).

@zooba zooba changed the title gh-86682: Adds sys._getcaller as a lightweight alternative to _getframe gh-86682: Adds sys._getcallingmodule as a lightweight alternative to _getframe Nov 17, 2022
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the news file should mention the new function too.

I’d like a review by @markshannon .

I’m surprised the calling function isn’t obtainable, but the use cases are there, so no complaints about that.

@zooba
Copy link
Member Author

zooba commented Nov 21, 2022

I'd like a review from Mark, too.

The calling function is obtainable, even more easily than the module name. But in every case we already use this, we'd then have to go to func.__globals__['__name__'] and handle the cases where these are not consistent (__module__ is often <module> and not the actual module name).

If there's a use case for both, I'll happily add both, but right now, we don't need the function object for anything.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really need to get @markshannon to answer the question. But he’s on vacation this week.

Doc/library/sys.rst Outdated Show resolved Hide resolved
Lib/doctest.py Outdated Show resolved Hide resolved
Doc/library/sys.rst Outdated Show resolved Hide resolved
@markshannon
Copy link
Member

I'm thinking aloud here, so bear with me...

I don't think that _getframemodulename is a good name.
Not that _getframemodulename is a bad name for this API, maybe its just that the API is a bit odd?

It feels to me that this API is doing two things, getting the frame, then the module name. Why not split the parts?

frame = sys._getframe(n)
modulename = get_module_name(frame)

or, potentially

modulename = sys._getframe(n).f_modulename

Unfortunately this doesn't achieve the stated goal of removing (almost) all uses of sys._getframe(n)

Which leads us back to adding a function to get the caller function.
Functions already have a __module__ attribute, so we get

modulename = sys.getcallerfunc(n).__module__

Which seems cleaner to me.

@markshannon
Copy link
Member

@zooba You state in an earlier comment that __module__ is not reliable.
It seems to me that we should fix that, rather than working around the bug.

@markshannon
Copy link
Member

My cursory scan of the code suggests that __module__ == "<module>" only occurs in the repl, doctests and other source-less code.

@markshannon
Copy link
Member

Even in the repl, it seems that functtions have __module__ == "main".

@zooba
Copy link
Member Author

zooba commented Nov 30, 2022

@zooba You state in an earlier comment that __module__ is not reliable.
It seems to me that we should fix that, rather than working around the bug.

I didn't trace down exactly why it wasn't being filled in correctly, but I'm willing to bet it's a much more complicated compatibility issue to change it. __module__ is essentially a public API, while _getframemodulename is not.

It's definitely a bit of an odd API though, I agree. That's why I happily tried returning the function object instead. Perhaps it was only doctests that didn't set it correctly?

Also keen on your thoughts on the FRAME_OWNED_BY_CSTACK flag in this comment, @markshannon. I think I've used it right, but couldn't convince myself that this is the right way to skip frames.

@markshannon
Copy link
Member

You need to check for invalid frames when walking the frame stack.
You might want to factor out the walking the stack in sys._getframe() https://github.com/python/cpython/blob/main/Python/sysmodule.c#L1889

@markshannon
Copy link
Member

OOI, what tests fail if you rely on __module__?

@zooba
Copy link
Member Author

zooba commented Dec 2, 2022

I've been looking into the __module__ issue, and it basically seems that <module> functions don't have __module__ set, which means anything at the top-level of a module won't work (e.g. virtually every typing, enum and dataclass definition). And this is the primary reason we use this, because anything that we'll be able to unpickle later on is going to have to be an importable name.

The only reliable workaround is to get it from f_globals, as the code currently does. Tracing through the code, I think it's probably best to get func_module first if it's available, as that should let code assign to func.__module__ and override what gets returned. Then we can fall back to f_globals['__module__'] if it's not on the function object (yet) and still match.

I added a better test to compare _getframe with _getframemodulename and updated the check to match what is used in _getframe (which looks like a fairly different condition, but I trust you).

@markshannon
Copy link
Member

markshannon commented Dec 3, 2022

it basically seems that <module> functions don't have __module__ set

That sounds like a bug. Let's fix it rather than working around it.

@zooba
Copy link
Member Author

zooba commented Dec 5, 2022

Seems easy enough to fix, there's just a constructor that needs to pull __name__ out of globals.

Even so, I still don't see a reason why we'd return function objects. Our code either needs the module name, or else it needs actual frame information (e.g. current lineno). This is a private API, so if we need something different in the future we can always change it.

This is also why I made it not raise an error but return None, since everywhere we'd use it we would swap in a default value anyway (I was tempted to add a default argument, but we'd want to replace explicit None as well, so the semantics work better for us this way). We really never need the function object right now, at least not anywhere we don't also need the locals or lineno.

@zooba zooba changed the title gh-86682: Adds sys._getcallingmodule as a lightweight alternative to _getframe gh-86682: Adds sys._getframemodulename as a lightweight alternative to _getframe Dec 5, 2022
@markshannon
Copy link
Member

Even so, I still don't see a reason why we'd return function objects.

Two main reasons:

  • It is a simpler, cleaner API. It makes introspecting the frame stack for profilers and the like (that don't care about the values of local variables) more efficient, and less of an auditing issue.
  • Because we can. Until 3.11, we couldn't implement it. I suspect that if we could have, some of use cases that you describe would have used it before now.

My feeling is that sys.getcallerfunc(n) would not need an audit hook. Would you agree?
Or is that an argument in favor of sys._getframemodulename()?

@zooba
Copy link
Member Author

zooba commented Dec 6, 2022

My feeling is that sys.getcallerfunc(n) would not need an audit hook. Would you agree?
Or is that an argument in favor of sys._getframemodulename()?

I started writing a reply earlier making the argument that only returning the name wouldn't require a red flag (I think it still deserves an event, but it's far less of a risk than if random code is unexpectedly doing _getframe). But I stopped because I don't actually have a strong argument, just a gut feel.

The most interesting thing you can do with a function object is mess with func_globals, which is the same as sys.modules[sys._getframemodulename()] anyway, and dict lookups don't get an audit event (though possibly we could special case sys.modules if needed... I'd do that as my own patch first before trying to upstream it).

But because we don't use it, it means instead of sys._getframemodulename() or 'default', we'd be writing getattr(sys._getframefunction(), '__module__', None) or 'default' everywhere. That, plus YAGNI, is my argument for only returning the name (or None).

@zooba
Copy link
Member Author

zooba commented Jan 10, 2023

If there are no other concerns raised I'd like to merge this sometime this week.

@zooba zooba merged commit b5d4347 into python:main Jan 13, 2023
@zooba zooba deleted the gh-86682 branch January 13, 2023 11:31
@@ -93,7 +93,10 @@ _PyFunction_FromConstructor(PyFrameConstructor *constr)
op->func_doc = Py_NewRef(Py_None);
op->func_dict = NULL;
op->func_weakreflist = NULL;
op->func_module = NULL;
op->func_module = Py_XNewRef(PyDict_GetItem(op->func_globals, &_Py_ID(__name__)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyDict_GetItem is broken by design and should not be used. See #106033.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants