-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: Add typing for dtype argument in io/sql.py #38680
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one comment
pandas/core/generic.py
Outdated
@@ -2639,7 +2643,7 @@ def to_sql( | |||
index: bool_t = True, | |||
index_label=None, | |||
chunksize=None, | |||
dtype=None, | |||
dtype: DtypeArg = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional
pandas/core/arrays/base.py
Outdated
@@ -211,7 +211,9 @@ def _from_sequence(cls, scalars, *, dtype=None, copy=False): | |||
raise AbstractMethodError(cls) | |||
|
|||
@classmethod | |||
def _from_sequence_of_strings(cls, strings, *, dtype=None, copy=False): | |||
def _from_sequence_of_strings( | |||
cls, strings, *, dtype: Optional[Dtype] = None, copy=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy: bool
This will probaby take some time since Mypy gives multiple To give you an idea of one such issue:
The problem is that above line 1947 we already do a check to see whether dtype is a dict or not, so a string can never rearch line 1947, but Mypy still gives it as an issue. I think it is related to overwriting variables with a different dtype |
@jreback, I'm bit stuck here. According to python/mypy#1174, Mypy does not like overwriting variables with different dtypes and we tend to do that a lot in Pandas, see for example below. Any ideas how to solve this or should I remove the dtyping untill we have a better solution. def to_sql(
self,
frame,
name,
if_exists="fail",
index=True,
index_label=None,
schema=None,
chunksize=None,
dtype: Optional[DtypeArg] = None,
method=None,
):
if dtype and not is_dict_like(dtype):
dtype = {col_name: dtype for col_name in frame}
if dtype is not None:
for col, my_type in dtype.items(): # Get an error here, since not all types in Optional[DtypeArg] (e.g. str) have items attr
if not isinstance(my_type, str):
raise ValueError(f"{col} ({my_type}) not a string") |
@avinashpancham this is why need to do this in a small incremental way to avoid issues iow just do one class of things then in another PR do others you cannot use the same variable names if they change type but again small incremental PRa otherwise these will bog down |
Other option would be to modify the code for each of these cases, such that we dont overwrite variables (see below). But this would take time and would make this PR way too long. I would then propose to close this PR and make an issue with all files that need to be changed such that people can contribute on a per file base. def to_sql(
self,
frame,
name,
if_exists="fail",
index=True,
index_label=None,
schema=None,
chunksize=None,
dtype: Optional[DtypeArg] = None,
method=None,
):
if dtype and not is_dict_like(dtype):
dtype_dict = {col_name: dtype for col_name in frame}
else:
dtype_dict = dtype
if dtype_dict is not None:
for col, my_type in dtype_dict.items():
if not isinstance(my_type, str):
raise ValueError(f"{col} ({my_type}) not a string") |
Ah I see we are thinking the same. I will then make an issue with all the files that need changing so that also other people can contribute. Will limit this PR to just the io/sql.py file then |
e9b49c6
to
e1bc42a
Compare
pandas/io/sql.py
Outdated
@@ -1483,7 +1496,7 @@ def to_sql( | |||
if dtype and not is_dict_like(dtype): | |||
dtype = {col_name: dtype for col_name in frame} | |||
|
|||
if dtype is not None: | |||
if dtype is not None and isinstance(dtype, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting you have to do this as L1496 explicity converts this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so is_dict_like will pass thrue a Series for example which will fail in other places which we are not likely testing. I would change L1499 to
if dtype is not None:
if not is_dict_like(...):
..
else:
dtype = dict(dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting you have to do this as L1496 explicity converts this.
Yes, the problem is that we define the type of dtype
already in the function at L1452. After that you cannot change the type of dtype
, even not by overwriting the variable. isinstance
checks are the only way to narrow it down. So the provided solution (see below) will not work, since we are overwriting the dtype
variable
if dtype is not None:
if not is_dict_like(dtype):
dtype = {col_name: dtype for col_name in frame}
else:
dtype = dict(dtype) # This line gives a mypy error
Mypy error
error: Argument 1 to "dict" has incompatible type "Union[ExtensionDtype, Any, str, Type[object], Dict[Optional[Hashable], Union[ExtensionDtype, str, Any, Type[str], Type[float], Type[int], Type[complex], Type[bool], Type[object]]]]"; expected "Mapping[Any, Any]" [arg-type]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance
checks are the only way to narrow it down.
also can use cast or assert. In this case, is_dict_like function will not narrow types, so ok to use a cast following the is_* call. so something like
if is_dict_like(dtype):
dtype = cast(dict, dtype)
...
else:
...
can always replace dict with Mapping, or union if more than dict is accepted for dict-like parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @simonjayhawkins did not know the cast
function also helps in narrowing the type down. With the cast
function it works.
pandas/io/sql.py
Outdated
if col.name in dtype: | ||
return self.dtype[col.name] | ||
dtype: DtypeArg = self.dtype or {} | ||
if isinstance(dtype, dict) and col.name in dtype: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shoudl is is_dict_like
pandas/io/sql.py
Outdated
dtype = self.dtype or {} | ||
if col.name in dtype: | ||
dtype: DtypeArg = self.dtype or {} | ||
if isinstance(dtype, dict) and col.name in dtype: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_dict_like
thanks @avinashpancham |
Thanks @jreback, learnt some new mypy things when working on this PR :) Will make a general issue to also update the typing of dtype in the remainder of the codebase. |
kk great thanks! |
Follow up PR for #37546:
Added typing
Optional[DtypeArg]
for dtype arg in pandas/io, since those functions accept single values and dicts asdtype
argsAdded typing
Optional[Dtype]
for dtype arg in pandas/core, since those functions only accept a single value asdtype
argsAdded typing
Optional[NpDtype]
for dtype arg in pandas/core for functions that only accept numpy dtypes asdtype
argscloses #xxxx
tests added / passed
passes
black pandas
passes
git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry