Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convertDoxygen: Add docstrings to more python objects #2526

Merged
merged 3 commits into from
Aug 16, 2023

Conversation

chadrik
Copy link
Contributor

@chadrik chadrik commented Jul 7, 2023

Description of Change(s)

With these changes, the convertDoxgen build script adds docstrings for many additional python objects.

For example, from Sdf these objects are now documented:

SpecType
Specifier
Permission
Variability
AuthoringError
GetNameForUnit
GetUnitFromName
ValueHasValidType
GetTypeForValueTypeName
GetValueTypeNameForValue
ConvertToValidMetadataDictionary
JustCreatePrimAttributeInLayer
CopySpec
ComputeAssetPathRelativeToLayer
ListOpType
CreatePrimInLayer
JustCreatePrimInLayer
CreateVariantInLayer

The primary change is that we no longer filter "file" xml type: these are necessary to find info on non-class functions.

In addition, this PR also provides a change necessary to support creation of python type stubs: the parser now captures function default arguments.

Performance impact

Casting a wider net during parsing increased parse time from 25s to 1m45s. The addition of slots shaved off ~10s and reduced memory usage.

I've run this through a profiler, and the majority of the time is spent in the xml parsing, with about half of the parsing-time spent in pxr code and the other half in the parser base class. I think there's room to make this more efficient if we can avoid instantiating XMLNode unnecessarily. I noticed that for every class/function there is a useless "doxygen" root node, so I tried collapsing that down to a single root, but it only saved 10s and it creates some other problems.

I also tried using multiprocessing to speed up parsing, but even distributing the parsing of files across 16 processes did not speed it up -- I suspect this was due to the amount of DocElement instances that needed to be serialized and marshaled across to the primary process after parsing, due to the way that the multiprocessing module works in python. If that inference is correct, this approach could be improved by filtering the resulting DocElement classes while parsing, as there are many extraneous functions being captured.

tl;dr with this change, the doc generation process takes longer, but there's not a simple solution.

This also provides changes necessary to support creation of python type stubs
Changes:
- do not filter "file" xml type: these are necessary to find info on non-class functions
- capture function default arguments
- add slots for efficiency: there are a lot of xml nodes
- fix a bug with Writer.__init__
@@ -52,29 +52,6 @@ class Writer:
# we can combine property docstrings for getters/setters.
propertyTable = {}

# Default constructor that assumes we're processing a single module
def __init__(self):
#
Copy link
Contributor Author

@chadrik chadrik Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python doesn't support overloads like C++: you can't simply define a function twice. This __init__ was being replaced by the one below it, so calling Writer() without args never actually worked. I've fixed the use of this class in convertDoxygen

logfile.write('\n'.join(lines))
logfile.close()
with open(output_file, 'w') as logfile:
logfile.write('\n'.join(lines))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use context manager to ensure that the file is closed in case of an exception (also, more idiomatic)

@@ -42,13 +42,18 @@ class XMLNode:
"""
Rrepresent a single node in the XML tree.
"""
__slots__ = ("parent", "name", "attrs", "text", "childNodes")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added slots for performance: we make a lot of these classes.

def isStatic(self):
"""Is this doc element static?"""
return self.static is not None and self.static == 'yes'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added isStatic and __repr__ methods for convenience

writer = Writer(packageName, moduleName)
# Parser.traverse builds the docElement tree for all the
# doxygen XML files, so we only need to call it once if we're
# processing multiple modules
if (docList == None):
if (docList is None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using is instead of == is considered idiomatic because None is a singleton.

@jesschimein
Copy link
Contributor

Filed as internal issue #USD-8482

@chadrik chadrik changed the title Doc stubs convertDoxygen: Add docstrings to more python objects Jul 7, 2023
.gitignore Outdated
@@ -1,4 +1,4 @@
.p4*
.DS_Store
.AppleDouble

*.pyc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're adding this - don't suppose you could add __pycache__ to the list as well?
Had that queued up in another PR, but if you just add that to your MR, would avoid a merge conflict...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chadrik, is the .gitignore change critical for the main goal of this PR? For tracking changes for USD releases, it helps if all the changes for a PR are related.

If the .gitignore change isn't specifically needed for this PR, would it be possible to move the .gitignore change (maybe combined with @pmolodo's change) to a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

Comment on lines -224 to -225
if (kind == "file"):
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this change is the heart of this PR - unfortunately, when we tested it, it had some negative side effects.
Specifically, many of our classes lost their docstrings after running convertDoxygen.py with this change.

One example was pxr.Usd.ZipFileWriter - in Usd/__DOC.py, we now get:

   result["ZipFileWriter"].__doc__ = """"""

instead of:

   result["ZipFileWriter"].__doc__ = """
Class for writing a zip file. This class is primarily intended to
support the .usdz file format. It is not a general-purpose zip writer,
as it does not implement the full zip file specification. However, all
files written by this class should be valid zip files and readable by
external zip modules and utilities.

"""

We're using Doxygen 1.8.18... which is old, but also newer than the 1.8.14 listed in VERSIONS.md.

@chadrik, can you confirm whether or not you're seeing the same, and what version of Doxygen you're using?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pmolodo , you made us realize we neglected to update VERSIONS.md for the addition of doxygen-awesome, which requires 1.9.x (we are using 1.9.6 internally). We will be pushing out updates to VERSIONS.md shortly, but can you retry your experiments with a 1.9.x version?

Copy link

@dsyu-pixar dsyu-pixar Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a test with only removing these lines and comparing build output (using doxygen 1.8.x). The new __DOC.py appears to have duplicate (empty) values for some things, like ZipFileWriter as @pmolodo notes, which end up overwriting the earlier defined values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed.

@chadrik
Copy link
Contributor Author

chadrik commented Jul 28, 2023 via email

…arsing the additional xml entries

These elements shadow class elements of the same name, but they lack any valuable information, and thus create empty docstrings.
@chadrik
Copy link
Contributor Author

chadrik commented Jul 28, 2023 via email

@sunyab sunyab changed the base branch from release to dev August 8, 2023 22:51
@pixar-oss pixar-oss merged commit 5f55f2d into PixarAnimationStudios:dev Aug 16, 2023
28 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants