-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue: stub generation #420
Comments
Great to see some discussion on first-party support for stub generation in Nanobind. I would like to add an important use-case for stub generation: documentation. As an example, we've generated this reference documentation for Lagrange, which uses Nanobind to create Python bindings (still very much WIP). I tried to use Sphinx Autodoc, but it doesn't support native C extensions (sphinx-doc/sphinx#7630). So instead I am using this Sphinx AutoAPI extension, which works by parsing stub files generated by Nanobind. |
@jdumas You might be interested in seeing the approach for an upcoming version of the Dr.Jit project that is based on nanobind: https://drjit.readthedocs.io/en/nanobind_v2/reference.html. I've so far managed without stubs -- this is the associated Sphinx file with plenty use of |
1.
Yes, somewhat.
I believe that splitting 3 For the default arguments, there could be an extension to |
@njroussel: After some thinking, the approach that I was planning regarding 3 is as follows (it somewhat mirrors what MyPy's Simple default arguments that can be parsed out-of-the-box are included in the stub. More complicated things (elaborate types, multi-line strings) that might actually reduce readability of a function signature are abbreviated as The stub generator will by default accept types of: Putting all together, it might look like this: nanobind_add_stubs(
# directory relative to which the stub generator will be called
# (the modules should be importable from here)
PATH ${CMAKE_INSTALL_PREFIX}
# Sequence of modules and submodules, stubgen will separately process each one
MODULES nanobind_example nanobind_example.submodule_1
# Ensure that a given dependency is built and installed
DEPENDS nanobind_example_ext
# The user can provide a python file with hooks as part of their project
# that modifies the behavior of stubgen
HOOKS ${CMAKE_CURRENT_SOURCE_DIR}/stubgen_hooks.py
) |
cc also @skallweitNV & @laggykiller, who seem interested in stub generation |
This all sounds excellent. Some comments from my point of view: I'd like to still be able to build stubs during development at build time. But people who want that can probably just call the stubgen manually from their build system. No need to have it in nanobind's cmake infrastructure. Regarding default values. Handling them using hooks seems like a good idea, but you could still consider the And thanks for all your efforts @wjakob, nanobind is awesome! |
Sounds great and looking forward for this to get implemented! One thing to suggest, some user may add sphinx docstring to the nanobind code (e.g. myself, see this example). In those case, I think nanobind stub generator could try to parse sphinx docstring and use it as a hint for function signature and stub generation. This can provide a way for user to override auto generated stub if somehow the auto generated stub is incorrect. Unfortunately, sphinx docstring do not have standardized way to denoate the default value of a function, so this can't be used to solve the problem of dealing with default arguments that do not make sense as a Python expression. Or, we could create our own standard of how default value of function should be denoted in sphinx docstring such that it could be recognized and used as hint during stub generation. Another small thing to add is if representation of default argument do not make sense as a Python expression, we could denote the type of the argument as |
That's not really necessary as there is the ellipsis syntax for omitting default values from stubs. Besides, it would make type checkers validate code that explicitly passes an Optional value even when it's not applicable. |
Dear all, I've made progress on this and have a working stub generation workflow that works well on a large-ish project (~300KB stub file with a mixture of classes, functions, properties, enums, and mixed bindings+pure Python code). Could I ask you all to check it out and report any issue you encounter? The latest version is available on the stubgen branch. There is also documentation on ReadTheDocs:
I followed @skallweitNV's suggestion and for now only have a CMake command that runs at build time. I plan to look into an install-time option later. I also plan to make one major API and ABI-breaking change that will imply moving to nanobind version 2.0 (!) The m.def("f", &f, nb::raw_doc(R"(
my_func(a: complicated_type)
A docstring)")); Interpreting this "raw" format is problematic when the stub generator needs to split up the signature into its constituent overloads. Hence, my plan is to replace this with a new annotation named m.def("f", [](std::vector<int>) { }, nb::signature("(FantasticVector[int]) -> None"), "Docstring 1");
m.def("f", [](std::set<int>) { }, nb::signature("(FantasticSet[int]) -> None"), "Docstring 2");
from typing import overload
@overload
def f(FantasticVector[int]) -> None:
"""Docstring 1"""
@overload
def f(FantasticSet[int]) -> None:
"""Docstring 2"""
del overload |
Nice work! Some minor problems while trying out using my project: https://github.com/laggykiller/apngasm-python/tree/stubgen-official Problem 1I tried to added the following lines in CMakeLists.txt:
This does not work if
This is not quite ergonomic. If I want to build a wheel with stub, I have to build and install wheel without stub first, run the command to generate stub, then build the wheel with stub! Problem 2Reference to classes within the same file is not correct. Consider this pyi file class APNGAsm:
@overload
def __init__(self, frames: Sequence[apngasm_python._apngasm_python.APNGFrame]) -> None:
...
class APNGFrame:
... It should be: class APNGAsm:
@overload
def __init__(self, frames: Sequence[APNGFrame]) -> None:
...
class APNGFrame:
... Alternatively: from . import _apngasm_python
class APNGAsm:
@overload
def __init__(self, frames: Sequence[_apngasm_python.APNGFrame]) -> None:
...
class APNGFrame:
... Problem 3It is not necessary to delete imports in pyi, it triggers warning in pylance: |
P.S. Consider this function to be binded
Note that This is the binding code
This is the signature generated def add_frame_from_rgb(self, pixels_rgb: apngasm_python._apngasm_python.rgb, width: int, height: int, trns_color: apngasm_python._apngasm_python.rgb = 0, delay_num: int = 100, delay_den: int = 1000) -> int: I would expect |
@laggykiller Thanks for giving it a try. The challenge with generating stubs at build vs install time is mentioned above and was something I still wanted to look into. In your example, wouldn't it be better to generate a stub for the actual module containing your types? (i.e., Alternatively let me teach you about a dirty hack that I use in my own projects.
In other words, importing the sub-extension causes it to add its types to the parent module. That way, all of the type metadata ( An alternative option for the stubs would be to register a rewriting rule with the stub generator that rewrites all occurrences of For your second message, does it work if you annotate the argument as The Edit: never mind about the last point. I found out that the |
I don't quite get what you are trying to say, but I think my command is doing what you said...? I specified
I replaced the following lines in my binding code:
Replaced as:
This causes error during compilation:
So I replaced
The signature generated is
I think it is fine to not Btw, would it be better that instead of: from typing import Annotated
from typing import Optional
from typing import Sequence
from typing import overload We import by... from typing import Annotated, Optional, Sequence, overload |
That's right. But did you check the contents of Thanks for letting me know about the issue with In the meantime, I added support for install-time stub generation. |
@laggykiller Can you try again with the latest version? This may also fix your issue with the workaround I had mentioned earlier. For the |
I can see that A solution specific for this hack is to add
It looks like stubgen never occurs if I add Here is my |
@laggykiller Are you sure that you used the latest version? The output looks wrong to me (for example, the latest version of the branch now groups imports). Regarding the other issue, I suspect that it may be related to a requirement to specify |
I tried again just now, and the stub generated is error free regardless of using the hack or not 🎉 (Except for NULL default value treated as 0 instead of None). I did notice that
Generating stub in cmake still not happening though:
You can try by:
It is expected that stub generation to occur during build and shown in log, and stub found inside wheel, but seems like no stub generation occured at all. |
@laggykiller Compilation of your repository fails on machine (macOS) in a boost build step performed by Conan.
However, it works after removing Boost since it doesn't actually seem to be used by the project. I figured out the issue with the install-time stub generation, this is now fixed. I also added the ability to pass nanobind + stubgen now avoid the |
Thanks for reporting! Can confirm this issue, seems like newer macOS version install newer Xcode which defaults to newer C++ standard that removed
Can confirm that it is now fixed 🎉 ! Only problem left is NULL default value treated as 0 instead of None...
I think using |
I find the For the |
@wjakob Got it, thanks! No more problems from my side. Let's see other's test result... |
Nice work! I just wanted to comment that I'm currently working on providing Bazel support for Python bindings with nanobind - I'll try adding the stubgen into a project and report how it goes when I get time. |
PR #421 has reached a state where I'm basically happy with it. It would be great if others can try it as well and report their successes or problems (e.g. @torokati44 @qnzhou @tmsrise @rzhikharevich-wmt). |
@wjakob Wow, thanks for all the work you put into this extension 🎉. I will archive my project and add a link to the official implementation as soon as it is merged. Maybe this should be a question in the nanogui repository, but will you be releasing a new package of nanogui that includes the stubs as well? |
Great idea, that would be a nice test that all works as expected. |
Thanks @wjakob, I'm really excited for this! I do have some questions about its use with submodules, though. e.g. from line 344 of I tried something like the following, with a separate stubgen call for each submodule:
but this hits errors:
|
@wjakob just to follow-up on this: While individual calls to |
@wjakob Two problems:
Example: from collections.abc import Sequence
from numpy.typing import ArrayLike
from typing import overload, Optional, Annotated Should be: from collections.abc import Sequence
from typing import Annotated, Optional, overload
from numpy.typing import ArrayLike |
@laggykiller : There is some sorting already in place. I am open to PRs that improve this further if it can be compactly (without adding a lot of complex code). It's not something I plan to work on. |
@CarlosBergillos @laggykiller @tmsrise I added a |
@laggykiller generation of |
Opened a PR #462
I don't have a nanobind project that is complex enough to test this. However, I did open a PR #463 for allowing stubgen recursively from CMake using
Looks like the PR already handle |
Nice! I'm very happy to see a I've been testing it. The folder structure / names are different from what I would expect.
here I would expect to get (a):
or even better (b):
To implement (b) we need to know in advance if a module has submodules or not (if it does, then create a new folder + For some inspiration But I'm also fine with (a) if it's easier. Afailk (a) and (b) are functionally equivalent. Another thing that is currently not working properly in |
For my project I added a function(nanobind_add_typed_module tgt)
nanobind_add_module(${tgt} ${ARGN})
nanobind_add_stub(
${tgt}_stub
MODULE ${tgt}
OUTPUT ${tgt}.pyi
PYTHON_PATH $<TARGET_FILE_DIR:${tgt}>
MARKER_FILE py.typed
DEPENDS ${tgt}
)
endfunction() This allows for a 'clean' cmkr [cmake]
version = "3.15"
cmkr-include = "cmake/cmkr.cmake"
[project]
name = "xxx_native"
[find-package.Python]
version = "3.8"
components = ["Interpreter", "Development.Module"]
[fetch-content.nanobind]
git = "https://github.com/wjakob/nanobind"
tag = "af57451"
include-after = ["cmake/nanobind.cmake"]
[template.nanobind_module]
type = "shared"
add-function = "nanobind_add_typed_module"
pass-sources = true
[target.xxx_native]
type = "nanobind_module"
sources = ["src/xxx_native.cpp"]
compile-features = ["cxx_std_17"] My use case is just a simple python extension in my from .build.xxx_native import * Which allows for a clean development experience and proper types. You just have to not use a multi-config generator and use Thanks for making this project by the way, it made everything relatively painless! |
Took a look at the most recent master. Seems like commit df8996a adds |
@tmsrise I can catch Even better, try to run Edit: Done some testing, looks like all stdlib has |
@tmsrise I committed @laggykiller's patch, does that fix your issue? |
Sorry for the delay. It fixed the issue, thanks! @laggykiller I'm not using any external python modules though. The |
@tmsrise does the error occurs when using recursive ( Can you help me find the culprit? Temporarily edit Lines 1114 to 1117 in f52877c
To this: try:
spec = importlib.util.find_spec(module)
except ModuleNotFoundError:
return 1
except ValueError as e:
raise ValueError(module + str(e)) See the error message. Your help is greatly appreciated. |
try:
spec = importlib.util.find_spec(module)
except ModuleNotFoundError:
return 1
except ValueError as e:
raise ValueError(module + ", " + str(e))
Printing instead of raising error shows the error for this BASE submodule multiple times as the other submodules are being generated. This is my current CMake snippet. If there's any mistakes that would cause this or if there's a way to make this less janky, let me know. set(PACKAGE_DIR ${CMAKE_CURRENT_BINARY_DIR}/my_sanitized_module)
file(MAKE_DIRECTORY ${PACKAGE_DIR})
find_package(Python 3.8 COMPONENTS Interpreter Development Development.Module REQUIRED)
# my_sanitized_module will be the python module name
nanobind_add_module(my_sanitized_module NB_STATIC src/py_bindings.cpp)
# Link the my_sanitized_module module with our real library
target_include_directories(my_sanitized_module PUBLIC ${CMAKE_SOURCE_DIR}/include/MySanitizedCppLib)
target_link_libraries(my_sanitized_module PUBLIC MySanitizedCppLib)
# Add the various binding source files for each submodule into the nanobind target
add_subdirectory(src)
# Create the stubs
add_custom_target(GenerateStubs ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND python3 ${CMAKE_BINARY_DIR}/_deps/nanobind-src/src/stubgen.py -r -m my_sanitized_module -O ${PACKAGE_DIR} -M ${PACKAGE_DIR}/py.typed
DEPENDS my_sanitized_module
)
# Rename the top level pyi to __init__.pyi so it works
add_custom_target(
RenameFile ALL
COMMAND ${CMAKE_COMMAND} -E rename ${PACKAGE_DIR}/my_sanitized_module.pyi ${PACKAGE_DIR}/__init__.pyi
DEPENDS GenerateStubs
)
# Create an init file in our build directory, gets installed with everything else later. This __init__ file is crucial for python being able to read the module. If it doesn't exist or is empty it will not work.
FILE(WRITE ${CMAKE_CURRENT_BINARY_DIR}/__init__.py
"from .my_sanitized_module import *")
# Install the package directory we created, which holds all of our stubs
install(DIRECTORY ${PACKAGE_DIR}
DESTINATION ${Python_SITEARCH})
# Install our __init__.py file
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/__init__.py
DESTINATION ${Python_SITEARCH}/my_sanitized_module)
# Now we're installing the actual module binary to that same package directory
install(TARGETS my_sanitized_module
COMPONENT python
LIBRARY DESTINATION ${Python_SITEARCH}/my_sanitized_module)
|
Re: stub staleness, I'm also trying to deal with this. I think it would make sense for hash(module) to always return the same thing given the contents of the nb module? Generated stubs can be marked with this so just one file needs to be checked to see if they need to be updated. Right now hash(module) returns a different number for each process |
seeing an odd error where stub generation fails with an import error, but only on windows and only in github ci (I am unable to repeat it on my local windows machine) I am using: nanobind_add_module(
soem_ext
STABLE_ABI
NB_STATIC
pyecm/soem_ext.cpp
)
nanobind_add_stub(
soem_ext_stub
MODULE soem_ext
OUTPUT "${CMAKE_SOURCE_DIR}/pyecm/soem/soem_ext.pyi"
PYTHON_PATH $<TARGET_FILE_DIR:soem_ext>
DEPENDS soem_ext
MARKER_FILE py.typed
VERBOSE
)
# Install directive for scikit-build-core
install(TARGETS soem_ext LIBRARY DESTINATION "pyecm/soem")
install(FILES "${CMAKE_SOURCE_DIR}/pyecm/soem/soem_ext.pyi" DESTINATION "pyecm/soem")
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/py.typed" DESTINATION "pyecm/soem") But I get import error on stub generation (again only on windows and only in github CI).
I made a discussion post: #579 |
When generating stubs for class MyEnum(enum.IntEnum):
A: MyEnum
B: MyEnum
# etc... What I was expecting was: class MyEnum(enum.IntEnum):
A = 0
B = 1
# etc... It appears I can get what I'm expecting by changing line 559 from: self.write_ln(f"{name}: {self.type_str(tp)}") to self.write_ln(f"{name} = {typing.cast(enum.Enum, value).value}") Note that the |
my issue was actually related to a missing dll dependency of my extension module and not related to nanobind. I found the dependency using https://github.com/lucasg/Dependencies and installed it in my build environment. details #579 |
Now that nanobind 2.0.0 is out, I will close this tracking issue (which has gotten very long). Let's track remaining issues with stub generation using separate tickets. |
Dear all (cc @cansik @torokati44 @qnzhou @tmsrise @rzhikharevich-wmt @njroussel @Speierers),
I am interested in providing a stub generation mechanism as part of nanobind. This is a tracking issue to brainstorm solutions.
Context: @cansik's nanobind-stubgen package is the only solution at the moment and works well in many cases. My goal is to overcome limitations discussed in discussion #163:
Enabling a better "out-of-the-box" experience by integrating stub generation into the CMake build system.
Stub generation currently involves complicated parsing, which is fragile and not always well-defined. Nanobind has this information in a more structured form and could provide it.
Stubs serve two purposes, and stub generation should cater to both needs:
To get autocomplete in VS Code and similar tools, which requires extracting function signatures and docstrings. I am mainly interested in this use case.
Type checkers like MyPy. I haven't used them before and know very little (hence this issue to exchange experience). It seems to me that stubs only need to contain typed signatures but no docstrings. But nanobind often generates type annotations that MyPy isn't happy with, so some sort of postprocessing may be needed.
Here is what I have in mind, before having actually having done anything. There may be roadblocks I haven't considered.
nanobind_add_stubs
. This will register a command that is run at install time. Basically we need the whole package to be importable, and doing that in a non-installed build might be tricky.When the user installs the extension to
${CMAKE_INSTALL_PREFIX}
, this will run a Python file (shipped as part of the nanobind distribution) that imports the package and then generatesnanobind_example/__init__.pyi
.Here, I am already getting confused because of unfamiliarity with stub generation. I've seen that packages sometimes contain multiple
.pyi
files. How does one decide where to put what? Can.pyi
files import each other? What would be the best way to expose this in thenanobind_add_stubs()
function?nb_func
) so that it exposes information in a more structured way, a bit like__signature__
frominspect.signature
. But__signature__
is too limited because it (like Python) has no concept of overload chains.Therefore, I am thinking of adding a function
__nb_signature__
that returns list of pairs of strings[("signature", "docstring"), ...]
that the stub generator can turn into something like thisstd::__1::vector<Item *>
). In that case, the stubs could omit that overload entirely, put some generic placeholder (object
?) or put the type name into a string. Thoughts?__repr__
) might not make sense as a Python expression. This seems like an unsolvable problem because nanobind simply does not know the Python expression to re-create an object. One option would be to try toeval()
the expression in the stub generator and omit it or replace it by some kind of placeholder if an issue is found. Not sure -- thoughts?typing.*
. An example are the nd-array types annotations which are AFAIK too complex to be handled by anything currently existing. I'm thinking that it could be useful if the stub generator commandnanobind_add_stubs(..)
could be called with a user-provided Python file that implements some kind of post-process on the type signatures.I'm curious about your thoughts on this! Thanks!
The text was updated successfully, but these errors were encountered: