-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type hints to core.py #7707
Conversation
CI failed because I used |
python-package/xgboost/core.py
Outdated
@@ -27,12 +28,18 @@ | |||
# lesser tested. For now we encourage users to pass a simple list of string. | |||
FeatNamesT = Optional[List[str]] | |||
|
|||
list_or_dict = TypeVar("list_or_dict", List, Dict) | |||
ArrayLike = np.ndarray | Sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now we can just use Any. I have been working on defining the protocol of input data: #7242 We accept a lot more types than ndarray.
python-package/xgboost/core.py
Outdated
raise RuntimeError(f"expected {ctype} pointer") | ||
res = np.zeros(length, dtype=dtype) | ||
if not ctypes.memmove(res.ctypes.data, cptr, length * res.strides[0]): | ||
if not ctypes.memmove(res.ctypes.data, cptr, length * res.strides[0]): # "memmove" does not return a value? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's probably a bug in mypy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were right. By the way, the maintainers of typeshed were able to fix this bug in a flash.
python-package/xgboost/core.py
Outdated
@@ -790,7 +807,7 @@ def set_uint_info(self, field: str, data) -> None: | |||
from .data import dispatch_meta_backend | |||
dispatch_meta_backend(self, data, field, 'uint32') | |||
|
|||
def save_binary(self, fname, silent=True) -> None: | |||
def save_binary(self, fname: str | os.PathLike, silent: bool = True) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using Union instead. This requires from future import __annotations__
, which has different behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if that different behavior matters (except for the explicit import), but since typing.Union
is available for Python <3.10, I agree it's better to stick with that.
python-package/xgboost/core.py
Outdated
@@ -806,7 +823,7 @@ def save_binary(self, fname, silent=True) -> None: | |||
c_str(fname), | |||
ctypes.c_int(silent))) | |||
|
|||
def set_label(self, label) -> None: | |||
def set_label(self, label: Collection) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why is this a Collection
? Maybe ArrayLike?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad; I was confused by dispatch_meta_backend
but apparently I missed the docstring...
The CI build still failed due to |
I was able to eliminate most mypy errors (from 60+ to 19; hopefully that was not because I introduced too many |
It turns out we cannot include Maybe we should separate the definitions of types from source code by putting them in a stub file? |
Apologies for the slow response, will come back to this tomorrow. |
That sounds like a good idea, we need to have some stubs later. |
@trivialfis Thanks for following up! I've done a quick demo using stub (basically I just used Some issues:
|
Thank you for experimenting with this! Does it make sense to simply use |
@trivialfis Simply using Let me move on with inline annotation without specific types for C pointers for now. |
Excellent! |
Hi, please let me know when it's ready for review. ;-) |
Hi @trivialfis, after some of our work, there are still 22 mypy errors in core.py but I find them quite tricky to tackle. The errors can be generated with
Besides the errors that relate to ctypes, some errors are due to the flexible user interface. For example, the setter While I don't think simply adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the nice work. One comment inlined.
python-package/xgboost/_typing.py
Outdated
ArrayLike = Any | ||
PathLike = Union[str, os.PathLike] | ||
CupyT = ArrayLike # maybe need a stub for cupy arrays | ||
NdarrayOrCupyT = Union[np.ndarray, CupyT] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might be better off to return Any
for prediction. The return type is actually a dependent type, which means given the input type, the return type is known so it's not a union. From a static type language's point of view (like c++, rust), the union can be changed at runtime, but the return type is determined at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy doesn't support dependent type yet, you can find some related feature requests in their issue list. To provide a concrete example, if the users use a cupy array as input to inplace_predict
, then they know the output is also a cupy array. However, with Union
the users will be asked to handle both numpy array and cupy array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. By the way, should we use string literal for cupy types, e.g. "cupy.ndarray"
instead of CupyT
here https://github.com/bridgream/xgboost/blob/mypy/python-package/xgboost/core.py#L276
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bridgream I think it will show an error as mypy couldn't find cupy on the test environment (cupy is an optional dependency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, overloading might help, but the change might be tedious as cupy is an optional dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should follow your solution in this PR by creating an interface. Since xgboost only touches some basic interface, this shouldn't take too much work. We could then introduce @overload
to achieve better hints.
I can try that next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the nice work! Will merge as part of 1.6.
Great, thank you! I'll continue to improve type hints in other Python scripts. |
I attempted to add type hints to core.py. I wasn't able to eliminate all mypy errors, but I hoped to make incremental changes because adding type hints seemed very intrusive and could easily trigger conflicts.
Related to #6496