-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infra: generate machine-readable PEP index #2475
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still on my short break, but this PR provided a nice distraction on a Sunday! (Edit: review is purely through the GitHub UI and I haven't run the actual changes to verify.)
It might also be nice to include at least the created date (I don't know what other fields can be guaranteed to be present and correct in all PEPs).
A
pep_sphinx_extensions/pep_zero_generator/pep_index_generator.py
Outdated
Show resolved
Hide resolved
pep_sphinx_extensions/pep_zero_generator/pep_index_generator.py
Outdated
Show resolved
Hide resolved
pep_sphinx_extensions/pep_zero_generator/pep_index_generator.py
Outdated
Show resolved
Hide resolved
pep_sphinx_extensions/pep_zero_generator/pep_index_generator.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking something very similar—we could generate a simple static JSON API with the JSONified PEP link and header data for machine-readable applications. Maybe put this under an /api/peps
endpoint (at least once we're ready to more publicly expose it)? Then if there was need/desire in the future, we could have an authors
endpoint, sub-endpoints api/peps/N
to get a single PEP's metadata, etc.
It would be really nice to just expose all the header data, especially given the push toward ensuring they follow a consistent format and are machine-parsable, to both allow us to do useful things in the rendered output (which I have some additional PRs almost ready to go on that improve further).
pep_sphinx_extensions/pep_zero_generator/pep_index_generator.py
Outdated
Show resolved
Hide resolved
pep_sphinx_extensions/pep_zero_generator/pep_index_generator.py
Outdated
Show resolved
Hide resolved
pep_sphinx_extensions/pep_zero_generator/pep_index_generator.py
Outdated
Show resolved
Hide resolved
pep_sphinx_extensions/pep_zero_generator/pep_index_generator.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Added! For those no present, their
Shall we put it there now? I've included a commit to move it to
Yes, stuff like that could be useful in the future.
I've added (all?) the header fields except for these which, contain email addresses:
And these which are less useful:
|
One more thing, sorry—why not make the top-level object an object (dictionary) with the PEP number as the key instead of an array? This would make it much easier and faster for clients to get a specific PEP, just
Sounds good; in the future, if there's any interest, we could actually parse the fields into more structured formats, especially once my PRs land that fully validate their format for much easier and trouble-free internal and external use.
Seems prudent to me.
At least once my forthcoming PR is in (should be in a few hours), the field will be validated to actually match the current or historically-specified formats, such that you can confidently grab only the author name without the email. But we can wait on that, if desired.
This isn't really useful now, but if we do decide to keep it around, it could change in the future if we allow, e.g., MyST PEPs (but we could always add it then).
These two are a legacy of the old PEP 9 text format (AFAIK) and haven't done anything for a long time, so yeah they should be elided (though if its easy to do so, you could add the last modified date that's already automatically calculated and displayed below the PEP). |
Updated to this structure: {
"1": {
"title": "PEP Purpose and Guidelines",
"authors": "Warsaw, Hylton, Goodger, Coghlan",
"discussions_to": null,
"status": "Active",
"type": "Process",
"created": "13-Jun-2000",
"python_version": null,
"post_history": "21-Mar-2001, 29-Jul-2002, 03-May-2003, 05-May-2012, 07-Apr-2013",
"resolution": null,
"requires": null,
"replaces": null,
"superseded_by": null,
"url": "https://peps.python.org/pep-0001/"
},
"2": {
"title": "Procedure for Adding New Modules",
"authors": "Faassen",
"discussions_to": null,
"status": "Superseded",
"type": "Process",
"created": "07-Jul-2001",
"python_version": null,
"post_history": "07-Jul-2001, 09-Mar-2002",
"resolution": null,
"requires": null,
"replaces": null,
"superseded_by": null,
"url": "https://peps.python.org/pep-0002/"
},
... |
pep_dict = { | ||
pep.number: { | ||
"title": pep.title, | ||
"authors": ", ".join(pep.authors.nick for pep.authors in pep.authors), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to make this an array of nicks, so users don't need to split the string again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, all of the fields are strings (or null), and match the literal value of the headers (minus cleaned-up whitespace). if we're going to do this, it would be inconsistent if we didn't process the other headers as well into more structured formats, which would be a good idea but better left to a future PR. With #2484 , tools can rely on each of the headers to match a certain format, so they are easier and more consistent to work with as strings and can be split with just .split(",")
, without having to worry much about edge cases.
As mentioned above,
In the future, if there's any interest, we could actually parse the fields into more structured formats, especially once my PRs land that fully validate their format for much easier and trouble-free internal and external use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's merge this as is. We've not announced or documented this API, so it's fine to iterate and change the schema if needed. Thanks all for the reviews! |
I just published the first thing using this JSON file :) https://pypi.org/project/pepotron/ A CLI to open PEPs in your browser. Type a PEP number, a Python version to see that version's release schedule PEP, or a some words to find the PEP with matching title. For example: $ pep 8
https://peps.python.org/pep-0008/
$ pep 3.11
https://peps.python.org/pep-0664/
$ pep "dead batteries"
Score Result
90 PEP 594: Removing dead batteries from the standard library
55 PEP 288: Generators Attributes and Exceptions
55 PEP 363: Syntax For Dynamic Attribute Access
55 PEP 476: Enabling certificate verification by default for stdlib http clients
52 PEP 349: Allow str() to return unicode strings
https://peps.python.org/pep-0594/ |
I like the version number feature! Installed, will be making use of. |
Sweet! Admittedly, the version number feature is a bit of a workaround for the fact that release schedules are published as arbitrary PEPs and not in one cohesive, dedicated place, but especially as someone who's navigating between dozens of PEPs every day, that's a fantastic tool! (Though, I fear ending up with hundreds of browser tabs if I'm not careful, heh, vs. my current less convenient approach of having a few PEP tabs, typing It would be cool to add option flags to list/search by various header fields, or a combination of the same. E.g. |
PEP 0 is a human-readable PEP index.
It would be useful to create a machine-readable version for, well, machines to read and process. This PR creates a
peps.json
of the key fields.Preview:
https://pep-previews--2475.org.readthedocs.build/peps.json