-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version 0.7.0 breaks pyam test #50
Comments
I'm not sure this is ixmp4's fault. My interpretation is that /opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/ixmp4/data/db/meta/repository.py:194: in bulk_upsert
super().bulk_upsert(type_df)
/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/ixmp4/data/db/base.py:376: in bulk_upsert
self.bulk_upsert_chunk(df)
/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/ixmp4/data/db/base.py:394: in bulk_upsert_chunk
cond.append(df[col] != df[updated_col]) and E pyarrow.lib.ArrowNotImplementedError: Function 'not_equal' has no kernel matching input types (string, double)
pyarrow/error.pxi:91: ArrowNotImplementedError indicate that |
I've tracked the error down for the most part, but I'm not sure how to resolve it yet. As far as I can tell,t he following is happening: Line 122 in 922f958
key value run__id
0 number 1 1
1 string foo 1 This survives until ixmp4/ixmp4/data/db/meta/repository.py Line 186 in 922f958
key value run__id type
0 number 1 1 Type.INT
1 string foo 1 Type.STR But then, we call run__id key type value_int value_str value_float value_bool id
0 1 number INT 1 None None None 1 here: Line 388 in 922f958
And for some reason, the comparison then fails. We want to insert a value of type @meksor, @danielhuppmann, if you have experience with this or immediately know what to do, please jump in here. Otherwise, I'll find a fix tomorrow. |
Thanks a lot for the detective work @glatterf42. |
What /exactly/ is pyam trying to pass to ixmp4 as meta values? |
OK so that becomes a dict like: |
And the pyam-ixmp4-test passed last week - so it must be either the pandas-update yesterday or some ixmp4 change since v0.6... |
Im running it locally with pandas 2.2.1 and the newest ixmp4 version... |
Ok update, if I install pyarrow /alongside/ ixmp4, the tests in ixmp4 also fail... Seems pandas uses pyarrow if its available, breaking this test.... |
OK so pandas version 1.5.3 still works. Seems pandas version >2 will change its behaviour if pyarrow is installed. converting the columns to each other's types just yields another pyarrow error. |
We bumped pyam to depend on pandas >= 2.0 a while ago to take advantage of the fast speed and improved API, so pinning ixmp4<2 isn't really an option... |
I've added some auxiliary output to def bulk_upsert_chunk(self, df: pd.DataFrame) -> None:
columns = db.utils.get_columns(self.model_class)
df = df[list(set(columns.keys()) & set(df.columns))]
existing_df = self.tabulate_existing(df)
print(f"df's dtypes: \n {df.dtypes}")
if existing_df.empty:
self.bulk_insert(df)
else:
df = self.merge_existing(df, existing_df)
df["exists"] = np.where(pd.notnull(df["id"]), True, False)
print(f"existing df \n {existing_df}")
print(f"existing df's dtypes: \n {existing_df.dtypes}")
print(f"new df's dtypes: \n {df.dtypes}") And the corresponding out shows this: df's dtypes:
value_bool object
value_float object
value_str object
key object
run__id int64
value_int object
type object
dtype: object
existing df
run__id key type value_int value_str value_float value_bool id
0 1 number INT 1 None None None 1
existing df's dtypes:
run__id int64
key object
type object
value_int int64
value_str object
value_float object
value_bool object
id int64
dtype: object
new df's dtypes:
run__id int64
value_bool string[pyarrow]
value_float string[pyarrow]
value_str string[pyarrow]
key string[pyarrow]
value_int string[pyarrow]
type string[pyarrow]
type_y string[pyarrow]
value_int_y float64
value_str_y string[pyarrow]
value_float_y string[pyarrow]
value_bool_y string[pyarrow]
id float64
exists bool
dtype: object So it looks like one of these df = self.merge_existing(df, existing_df)
df["exists"] = np.where(pd.notnull(df["id"]), True, False) is responsible for the data conversion to unexpected types. |
It's happening in dask/dask#10631 might be related. |
As I was working on pyam yesterday (IAMconsortium/pyam#818) I noticed that
ixmp4 0.7.0
broke the testtest_ixmp4_integration[test_df0]
fromtests/test_ixmp4.py
(https://github.com/IAMconsortium/pyam/actions/runs/8067457546/job/22037916444) with the following error:looks like it's got something to do with
pyarrow
.Reverting to ixmp4 version 0.6.0 fixed the test.
Just talked to @meksor and he said that you, @glatterf42, would be the best person to take a look.
FYI @danielhuppmann.
The text was updated successfully, but these errors were encountered: